1. Objectives
  2. Introduction
  3. Rare Genetic Disorders and Common Complex Diseases 2-1. Single Nucleotide Variant (SNV) 2-2. Haploltype 2-3. Linkage Disequilibrium (LD) 2-4. Tag SNP 2-5. Hardy-Weinberg Equilibrium (HWE)
  4. New Paradigms in Gene Research
  5. Single Nucleotide Variant Association Test and its Statistical Genetic Models 4-1. Fisher’s Exact Test 4-2. Chi-squared test 4-3. Cochran-Armitage Trend Test (CATT) 4-4. Regression Analysis 4-5. Hardy-Weinberg Equilibrium (HWE) Test 5-6. Manhattan Graph

0. Objectives

We’re gonna have an understanding of five main value-chain of genomics industry: sampling, sequencing, data analysis, data interpretation, and clinical application. Based on this knowledge we’re going to have a better understanding of the current genomics industry’s landscape that comprises of five different fields of rare disease genetics, cancer genomics, perinatal genetics, pharmacogenomics, and lifestyle/chronic disorders and health state. Discuss how we are tackling at the goals of ‘genomics for all’ and ‘health improvement over the entire population’, and while doing that learn the basic genetic concepts of single variant-single phenotype association study. This chapter also discusses Mendelian laws of genetics and linkage disequillibrium, and also talk about the statistical tests commonly used in evidence-control group based association studies: Fisher’s exact test, Chi-squared test, Cochran-Armitage Trend Test, Hardy-Weinberg Equilibrium Test.

1. Introduction

It was 1953 that Watson and Crick discovered the double helical structure of DNA molecules. It was a triumph in Mendelian genetics in that there indeed was the mediating substance that carries inherited information from parents to offspring. From then on, in 2003, 50 years from Watson and Crick’s discovery, human genome project announced that they have decoded ~3 billion bases in human genome. Although there have been rosy speculations and expectations that now humanity may conquer all diseases, but that was not the

The genomic data analysis industry has expanded ever since then. The value chain goes like this: (1) retrieving relevant samples, (2) sequencing, (3) data analysis, (4) data interpretation, (5) translational and clinical application. Most of the high-value in the initial phase of the industry growth focused on sequencing technology as you can see with the case of Illumina. But as the industry matures, that is to change as well, and steps (3), (4), (5) are bound to create more values. It is incremental as the step goes further back because sampling and sequencing tech themselves are walking the path of being reduced. As of now, data analysis is expanding rapidly (possibly at its prime time?) and clinical applications are at their infantile phase. Genomic data analysis is the toughest bottleneck for now and genomic data interpretation field will create more and more high-added values. Genomic data interpretation pertains to clinical analysis reports, linking genomic data with health and medical records for interpretation, and interpretation of customized and personalized genomic data and providing such product to the clients.

From “Genomics in the UK: An Industry Study for the Office of Life Sciences, 2015”

For the time being, genomic data analysis and interpretation are being attempted in problems like rare diseases, cancer, and perinatal problem (perinatal (pertaining to pregnancy to 7 days into birth) diagnostics). The area of application is being expanded and is soon expected to cover the entire medical fields. However, such prospect still has its limitations to overcome, as well.

  1. Clinical relevance of genomic data analysis results in making clinical decisions is still pretty low for most of the disease and health-related fields.

  2. Our understanding of complex diseases, and genomics of healthy people (normal) are still at its infantile stages.

Current interests in genomic data analysis industry can be classified into five broad categories:

  1. rare diseases

  2. cancer

  3. perinatal genetics

  4. pharmacogenomics (which drugs to use for individuals with specific genomic or expression profile)

  5. “lifestyle”, chronical diseases and well-being

Precision medicine that aims at personalized medicine for everyone is positioned in the upper-right corner, marked with a star. That’s the holy grail and diamond-standard of the industry, but the reality is a linear inverse function like in the picture below.

Rare diseases are almost all dictated by the genotype itself. Genotype determines phenotype. However, applicable target population is less than 1% even if you gather all sort of rare diseases together. This means market size is rather small for it to be a profitable and thriving industry.

On the other hand, products that target at well-being and general population like the disease risk prediction DTC (direct to consumer) provided by 23andMe covers most of the population, but its clinical and medical relevance is darn low. Well-being market exists in the bottomright corner, and rare diseases at the upperleft corner.

Simplistic approach to increase the medical credibility of well-being region (bottomleft corner) to move upward towards deterministic level of rare diseases or expand the medical relevance of genomic data analysis in rare diseases (upperleft corner) are both not feasible. The latter pertains to complex traits that involve hundreds~even thousand genes interacting with each other in a rather web-like fashion and also environmental interventions. The former pertains to only the rarest cases of diseases called “Mendelian diseases” that are determined by only one gene.

So we need a newer genomics. Our understanding of complex diseases are still at the basic level and simplistic approach to increase observed sample number (N) and observation period (time) won’t do justice to solve the problem. DNA polymorphic loci comprise of about 30 million bases, which make up about 1% of the total nucleotides present in the genome. DNA has the potential to create \(2^{30000000}\) combinations of polymorphisms. We can consider only a hundred thousand genes of coding region, or only the most important 10000 loci. This still results in about \(2^{10000}\) combinations possible. It’s astronomical number. Or rather, it’s genomical number! It’s not enough just to analyze the entire population present on earth to get all possibilities of combinations. So most of the potential polymorphisms have not yet been discovered. Unlike the deterministic way of how only a handful number of genes’ genotype determine rare diseases, complex disease phenotypes are more of “emergent” properties that arise from the interaction among many genes and their corresponding polymorphisms.

Also, there’s the question of whether the scope of current genomic data analysis limited in rare diseases, cancers, prenatal testing should really be the focus and essence of genomics problem. Cancer genetics discusses somatic mutations that are instable in its nature and ever-changing, totally different from germline mutations that cause complex trait diseases and remain identical throughout lifespan. They follow different principles. Rare disease and perinatal diagnostics and analyses are closer to traditional genetics rather than modern genomics. We do utilize the NGS technology in obtaining data, but the actual interpretation process still remains at the level of classical genetics where one analyzes individual genes and variants. We’re still living in the paradigm of Mendelian genetics and molecular biology where individual genes determine phenotypes. So it may be more fitting to call it “genetics using genomics technology” rather than genomics itself. For example, perinatal diagnostics problem approaches non-invasive, cutting edge NGS technology in obtaining the sequencing and variant data, but the underlying principle is still basically cytogenetics analysis.

So we need better overall systematic understanding of “omics” of emergent properties in the truly genomic problems like complex diseases and various health states.

Pharmacogenomics takes a very special place in the field of genomics:

  1. The entire population may be addressed as the sample population, since almost everyone takes some form of medication in their lifetime.

  2. Metabolism and effectiveness of drugs are determined by ADME (absorption, distribution, metabolism, and excretion), which are in turn regulated by individual genotypes and phenotypes that pertain to the individual differences in ADME and also pharmacodynamics. Pharmacodynamics refers to the regulation of target protein function through mechanisms that modify the molecular function of transporters, channels, enzymes and target proteins.

So pharmacogenomics is a really unique and optimal genomics research model that consists of both enough number of samples and specific genotype-phenotype relationship. Also, it is obvious that the individual differences in drug response and sensitivity do not rely on single gene variants but is a process determined systematically through numerous interactions among many genes that are related to pharmacokinetics and pharmacodynamics. Additionally, the expression of a phenotype solely presents itself only when drugs are administered. So the author thinks pharmacogenomics will develop into a representative research field of germline-based genomics to enhance our understanding of the interaction between environment (“drugs”) and genes because it’s got much less complexity compared to well-being research in terms of genomics approach, and also scientific reproducibility is guaranteed above certain levels.

2. Understanding the Genetics of Rare Genetic Disorders and Complex Trait Diseases

  • Darwin: theorized the concept of evolution and inheritance in organisms. Couldn’t explain the exact mechanisms of inheritance from one generation to the next.

  • Mendel: Discovered three law of inheritance through experiments

    • law of segregation
    • law of independent assortment
    • law of dominance
  • Francis Galton: believed that Darwin’s idea could be explained only through quantitative analyses of diversity in genetic variants. = This is the idea of biometrics and biometrician began hereon, as opposed to the Mendelian school of thought.

Left shows standard distribution of common traits consisting of multiple common variants. Right exhibits Mendelian school of thought. These came to be one eventually. Pearson and Fischer are statisticians and geneticists active at the time, as disciples of Francis Galton. Ronald Fischer attempted to integrate Galton’s regression model (Galton discovered regression and correlation) and various quantitative analytical methods with Mendelian laws of inheritance, which established the genetics of quantitative traits. Quantitative genetics goes a step further from discrete and categorical (seemingly qualitative) traits and explores into the depths of continuous quantitative traits like people’s height. Usually pertaining to multi-variable analysis. If uni-variate analysis is applied to qualitative trait, quantitative traits involve multi-variate analysis.

70s saw the advent of Sanger sequencing and linkage analysis for pedigree analysis. This led to successful identification of disease genes. Law of inheritance includes the three Mendelian laws described previously plus law of linkage (연관법칙). Correlation between genotype and phenotype is the oldest research topic in genetics. Association analysis, which assumes independent assortment, and linkage analysis, which assumes law of linkage, are key to understanding genotype and phenotype relationship.

NGS, however, made us realize that there are too many genotypes to consider and the variance among individuals and races are bigger than expected. Complex diseases like hypertension and diabetes mellitus are just beginning to step off. The amount of data generated by NGS calls for a trends of genomics that transcends traditional genetics. In this chapter we are acquainted with the basic concepts required for genomic analysis of rare genetic disorders and common complex diseases while discussing novel interpretations of conventional notions and improvement thereof.

2-1. Single Nucleotide Variant

Somatic mutation in a nucleotide (base) is point mutation. Germline variations in single nucleotide is called single nucleotide variant (SNV) and among them rare ones are called rare variants. If such variant is observed in over 1% of the population, it can account for the individual to individual differences in traits regardless of disease process, and these are called SNPs (single nucleotide polymorphism). This threshold is lowered to 0.5% as variants are found in over 1% of the bases, unlike what the researchers thought in the beginning. Nowadays, the term “SNV” is preferred over point mutation or polymorphisms as a more inclusive and neutral terminology. If the variant occurs over 0.5%, it’s called common variant, others that occur less frequently are called rare variants. Variants that are not inherited from parents but by de novo mutation in germline DNA is called private variant, sometimes.

2-2. Haplotype

There are 46 chromosomes or 23 pairs of chromosomes in humans. This sort of genome with two sets of same things (similar) is called a diploid genome. In fact there are more than diploids in nature. The sets of chromosomes a cell possesses is called ploidy, and humans are diploid organisms. It’s impossible to have half of a complete set of chromosomes. Haploid means one set of chromosomes that a gamete possesses. Since human zygotes are diploid, haploid of human cells is a monoploid.

Haploid + genotype = haplotype

Haplotype means the combination of alleles linked in a same chromosome. Next figure shows haplotypes C1 and C2, as determined by SNP1 and SNP2.

Haplotype phasing by SNP chip is rather difficult.

Sequencing is better for haplotype phasing.

Current NGS tech performs sequencing with short reads with lengths around 100~300 bases. So a big haplotype estimation is still difficult. Haplotype phasing is why we need long read sequencing like PacBio’s SMRT or nanopore sequencing. Haplotype’s statistical genetic meaning is a set of SNPs on the same chromosome. We call SNP set that are strongly linked together as haplotype block, and these can be used for calling the correct haplotype even without sequencing the entire genome.

2-3. Linkage Disequilibrium (LD)

Most hardships during genomic analysis comes from the fact that genes or loci are linked in sequential order. Therefore, the individual genes on a chromosome are not independent at all, and are passed down by inheritance. Especially during meiosis, chromosomal crossover divides the genes on a chromosome into different chromosomes as well. however, if genes are closer together, they are more likely to passed down to descendants. The measurement of this is called the genetic linkage. Crossover also does not occur randomly. Also, crossover hotspots are irregularly distributed throughout genome. Therefore it is rather difficult to apply independent assortment assumption to statistical analysis of genes and loci that are linked with irregular distribution and linkage probability.

LD (linkage disequilibrium) means that haplotype frequency between loci is different from the value predicted from the allele frequency of each. That is, the corresponding two or more loci are linked together. It is commonly known that African generation number is greater than other race so that less LD regions exist in African race. When recombination occurs, the LD between the two SNPs are broken and they are in linkage equilibrium, becoming non-LD. non-LD means that two loci occur at random chance to each other.

2-4. Tag SNP

If LD is strong among variants, one can infer the presence of the rest of the variants from the existence of one variant and thus determine the haplotype. A specific allele that can be used as a proxy for a certain haplotype or presence of other variants is called tag SNP. Figure below shows an example of calling 4 different haplotypes from 6 tag SNPs. These four haplotypes were enough to describe the 90% of the entire population’s haplotypes. The resulting SNPs from association studies are tag SNPs in a lot of cases.

Genetics research is about locating the causal genetic variation. However, it’s rather difficult to explore the entire chromosomal DNA for such locus. Therefore we use tag SNPs that are directly associated with unobserved causal locus that results in the disease phenotype. Therefore, the typed marker locus can be said to be in an indirect association with disease phenotype.

2-5. Hardy-Weinberg Equilibrium (HWE)

Hardy and Weinberg each came up with the principle that describes the equilibrium of a population’s gene pool independently in 1908. This principle is called the Hardy-Weinberg principle. If a population group has no mutation, and random mating occurs, and there’s no influx of genes from outgroup, the genotype and allele frequency are retained as an equilibrium over time.

\[f(A)=p, f(a)=q\]

\[f(AA)=p^2, f(Aa)=2pq, f(aa)=q^2\]

\[(p+q)^2=p^2+2pq+q^2\]

HWE test is used in testing for population stratification, where a systematic difference in allele frequencies between subpopulations in a population exists possibly due to different ancestry…1 and also in testing for non-random mating.

If a population’s allele frequency and disease prevalence do not follow HWE, it means there is violation of the random-mating assumption. It may naturally occur in nature by sexual selection. With a simple Chi-squared testing, you can determine whether HWE is violated or not. For example, HWE violations occur in (1) a group extracted from one ancestry/household, (2) inbreeding, (3) assortive mating (mating between individuals that have similar traits.), (4) mixed group of different races, (5) data error. (1)~(3) are states where you cannot just assume random mating. Errors that occur in a well-compolsed population group usually consist of data errors. (4)’s population stratification (population structure) problem is very complicated and is a problem that needs to be addressed and solved in modern GWAS research. Also, sex chromosomes exhibit different inheritance pattern from autosomal chromosomes, therefore we need analysis methods that reflect such differences.

Hardy-Weinberg Equilibrium in the Locus with Two Alleles

3. New Paradigms in Genomics Research

Obama administration started ‘precision medicine initiative’ in 2015 with 215 million dollars of Federal budget. Short-term goals were prevention and better treatment by improvements in cancer genomics. Its long-term goals were about forming pan-national scientists’ network to enhance understanding of diseases and health while establishing an integrative knowledge database through conducting researches on the gathered data of a million (1,000,000) American cohort. This was later renamed as “All of Us” in 2016. All of Us project is accompanied by 100 partners and this includes Google’s Moonshot Program of a Google biosciences start-up called Verily Life Sciences. Yearly budget has increased from 130 million dollars in 2016 to 230 mil. in 2017 to 290 mil. in 2018.

UK is also working its arse off. In 2012, then PM David Cameron signed the paper that will conduct nation-wide 100000 British genomics project called Genomics England. Whole Genome Sequencing of the hundred thousand samples was completed in 2018. Cooperation with external organizations and individuals by publicly offering the data officialized in August, 2019. On the other hand, UK BioBank publicly released all of its WES data and clinical data in March, 2019.

All this trend began back in 2007 with MyCode initiative started by Geisinger Health System, which is a Pennsylvania and New Jersey-based local heath service system. It began frm a biobank in 2007, but further developed into DiscovEHR project that integrated Electronic Health Records that includes long-time tracing data and genomics data by cooperation with Regeneron Pharmaceuticals. About 200,000 patients have provided their genetic samples up to 2018. DiscovEHR aims to gather 250,000 patients’ data and so far WES of 180,000 patients have been completed. In the midst of WES data, they have identified 4,028,206 unique SNV InDels with about 98% of which had less than 1% of allele frequency, and over 176,000 variants are loss of function variants. It’s quite comparable to 1000 genomes project where its variants with frequency less than 1% comprised of about 81.2% of total (68850471/84739838).

NGS has changed the paradigm of genetics research. Traditional genetics’ cohort researches focused on gene-environment interactions and traced control group and experimental group (exposed to a certain environment) over a long period of time (long-term tracing). But nowadays, one can resconstruct case-control study’s environmental exposure using the fact that germline DNAs do not change and high-throughput research cost drops drastically with NGS, and you can use long-term tracing data. Electronical phenotypes is also another thing.

4. Single Nucleotide Variant Association Study and Statistical Genetics Models

GWAS (Genome Wide Association Study) is a variant-phenotype association study of single nucleotide variant’s additive model. It performs association analysis between individual variant and individual phenotype by logistic regression of linear or binary phenotype representation of a continuous phenotype. It tests over hundred thousands of loci and this may result in many type 1 errors (false positives). Therefore we need multiple hypothesis correction like Bonferroni correction, Benjamini and Hochberg false discovery rate. Also SNP chip’s own experimental artifacts sometimes bias the data systematically, so therefore Mendelian law that serves as the true model to reduce false positives is very important. But GWAS over a large population rarely contains an accurate pedigree. Linkage analysis that require a pedigree is usually about a rare disease that is caused by one or few genes with high penetrance. This sort of analysis is optimal when it follows Mendelian inheritance. However, in the case of common, complex traits where many genes are involved and gene-environment interactions also need to be put into consideration are usually done with association study.

Modern GWAS’s assume significance level of 5% and about million times of multiple hypothesis testing. In this case, significance threshold is \(5 \times 10^{-8}\). Single nucleotide variant association study has weaker statistical power compared to common variants if their effect sizes are the same. For example, if MAF(minor allele frequency)= 0.1, 0.01, 0.001 and it is assumed that odds-ratio is 1.4 for the particular variant, we are required 6400, 54000, 540000 number of samples respectively in order to achieve 80% statistical power. Because rare variant numbers are a lot more than common variants, there needs to be a significance level correction about multiple hypothesis testing. In summary, rare variant’s single nucleotide association analysis shows good results when there are enough number of samples, if the effect size is very large, or the rare variant has fairly high frequency. Heterogeneity within a population is still a pretty tough problem to solve, and needs to be approached delicately with techniques like genomic control analysis.

In this chapter, we learn different statistical models for genetics of single variant association studies.

Despite such limitations, rare variant’s single nucleotide variant association study is still useful, and it can be further utilized with Q-Q plot, genomic control analysis, Manhattan graph, etc. to do quality evaluation, batch effect control, population structure analysis, etc. PRS (Polygenic Risk Score) shows fairly predictive power about a quantitative complex trait without an accurate pedigree.

4-1. Fisher’s Exact Test

df <- data.frame(allele=c("Case", "Control", "Total"), A=c(10, 66, 76), G=c(12, 26, 38), total=c(22, 92, 114))
df
##    allele  A  G total
## 1    Case 10 12    22
## 2 Control 66 26    92
## 3   Total 76 38   114
row.names(df) <- c("Case", "Control", "Total")
df
##          allele  A  G total
## Case       Case 10 12    22
## Control Control 66 26    92
## Total     Total 76 38   114
Allele A G Total
Case 10 12 22
Control 66 26 92
Total 76 38 114
# Case = (12, 10), Control = (26, 66)
fisher.test(matrix(c(12, 10, 26, 66), nrow=2))
## 
##  Fisher's Exact Test for Count Data
## 
## data:  matrix(c(12, 10, 26, 66), nrow = 2)
## p-value = 0.02452
## alternative hypothesis: true odds ratio is not equal to 1
## 95 percent confidence interval:
##  1.049880 8.865787
## sample estimates:
## odds ratio 
##     3.0129
Dominant AA AG+GG Total
Case 5 6 11
Control 23 23 46
Total 28 29 57
Recessive AA+AG GG Total
Case 5 6 11
Control 43 3 46
Total 48 9 57
fisher.test(matrix(c(6, 5, 23, 23), nrow=2))
## 
##  Fisher's Exact Test for Count Data
## 
## data:  matrix(c(6, 5, 23, 23), nrow = 2)
## p-value = 1
## alternative hypothesis: true odds ratio is not equal to 1
## 95 percent confidence interval:
##  0.2617789 5.7178442
## sample estimates:
## odds ratio 
##    1.19617
fisher.test(matrix(c(6, 5, 3, 43), nrow=2))
## 
##  Fisher's Exact Test for Count Data
## 
## data:  matrix(c(6, 5, 3, 43), nrow = 2)
## p-value = 0.0008184
## alternative hypothesis: true odds ratio is not equal to 1
## 95 percent confidence interval:
##    2.505229 130.457346
## sample estimates:
## odds ratio 
##   15.77279

4-2. Chi-squared test

Pearson’s chi-squared (\(\chi^2\)) test is also used frequently along with Fisher’s exact test. Here, the contingency table’s expected frequency must be at least 5. Code runs even when expected frequency is below 5, although with a warning. Chi-squared test works well with big sample numbers, but not when sample number is small. P-value is more conservative as well.

chisq.test(matrix(c(6, 5, 3, 43), nrow=2))
## Warning in chisq.test(matrix(c(6, 5, 3, 43), nrow = 2)): Chi-squared
## approximation may be incorrect
## 
##  Pearson's Chi-squared test with Yates' continuity correction
## 
## data:  matrix(c(6, 5, 3, 43), nrow = 2)
## X-squared = 11.998, df = 1, p-value = 0.0005327
chisq.test(matrix(c(6, 5, 23, 23), nrow=2))
## 
##  Pearson's Chi-squared test with Yates' continuity correction
## 
## data:  matrix(c(6, 5, 23, 23), nrow = 2)
## X-squared = 6.8618e-32, df = 1, p-value = 1

4-3. Cochran-Armitage Trend test (CATT)

One of the trend analysis. It tests alternative hypothesis (\(H_1\)) that there is a certain one-way trend of increasing success rate or decreasing success rate against the population.

So you can test if there’s a difference between experimental and control groups without making a 2 by 2 contingency table, but you can’t get odds ratio like dominant model and recessive model. Required R package is coin and we utilize data provided by SNPassoc.]

# Data and Library Loading for Cochran-Armitage Trend Test
library(SNPassoc)
## Loading required package: haplo.stats
## Loading required package: arsenal
## Loading required package: survival
## Loading required package: mvtnorm
## Registered S3 method overwritten by 'SNPassoc':
##   method            from       
##   summary.haplo.glm haplo.stats
library(coin)
data(SNPs)

The data contained in SNPassoc are SNPs-based Whole Genome Association Studies. This package is actually an analytical tool for SNPs and carries out most common analysis when performing whole genome association studies (GWAS). data included in SNPs from SNPassoc includes selected SNPs and other clinical covariates for cases and controls in a case-control study. Its data.frame contains the following columns:


id identifier of each subject
casco case or control status: 0-control, 1-case
sex gender: Male and Female
blood.pre arterial blood pressure
protein protein levels
snp10001 SNP 1
snp10002 SNP 2
snp100036 SNP 36


SNPs[1:10, 1:10]
##    id casco    sex blood.pre  protein snp10001 snp10002 snp10003 snp10004
## 1   1     1 Female      13.7 75640.52       TT       CC       GG       GG
## 2   2     1 Female      12.7 28688.22       TT       AC       GG       GG
## 3   3     1 Female      12.9 17279.59       TT       CC       GG       GG
## 4   4     1   Male      14.6 27253.99       CT       CC       GG       GG
## 5   5     1 Female      13.4 38066.57       TT       AC       GG       GG
## 6   6     1 Female      11.3  9872.46       TT       CC       GG       GG
## 7   7     1 Female      11.9 11132.90       TT       AC       GG       GG
## 8   8     1   Male      12.4 29973.43       TT       AC       GG       GG
## 9   9     1   Male      14.5 31114.29       CT       CC       GG       GG
## 10 10     1 Female      12.2 41768.55       TT       AC       GG       GG
##    snp10005
## 1        GG
## 2        AG
## 3        GG
## 4        GG
## 5        GG
## 6        GG
## 7        AG
## 8        AG
## 9        GG
## 10       AG

Make an SNP table class object with alleles separated by delimiter “/”.

# Generate an SNP table class object with alleles separated by delimiter "/"
datSNP <- setupSNP(SNPs, 6:40, sep="")

datSNP
##      id casco    sex blood.pre    protein snp10001 snp10002 snp10003 snp10004
## 1     1     1 Female      13.7  75640.523      T/T      C/C      G/G      G/G
## 2     2     1 Female      12.7  28688.215      T/T      A/C      G/G      G/G
## 3     3     1 Female      12.9  17279.591      T/T      C/C      G/G      G/G
## 4     4     1   Male      14.6  27253.988      C/T      C/C      G/G      G/G
## 5     5     1 Female      13.4  38066.569      T/T      A/C      G/G      G/G
## 6     6     1 Female      11.3   9872.460      T/T      C/C      G/G      G/G
## 7     7     1 Female      11.9  11132.903      T/T      A/C      G/G      G/G
## 8     8     1   Male      12.4  29973.431      T/T      A/C      G/G      G/G
## 9     9     1   Male      14.5  31114.294      C/T      C/C      G/G      G/G
## 10   10     1 Female      12.2  41768.551      T/T      A/C      G/G      G/G
## 11   11     1 Female      11.1  28543.861      C/T      A/C      G/G      G/G
## 12   12     1   Male      13.2   5018.812      C/C      C/C      G/G      G/G
## 13   13     1   Male      15.1  24497.292      C/T      C/C      G/G      G/G
## 14   14     1 Female      12.3  26806.956      T/T      A/C      G/G      G/G
## 15   15     1   Male      11.0  21046.765      T/T      A/C      G/G      G/G
## 16   16     1 Female      12.0  23633.860      T/T      C/C      G/G      G/G
## 17   17     1   Male      13.9  15918.380      C/T      C/C      G/G      G/G
## 18   18     1 Female      11.8  24617.420      T/T      A/C      G/G      G/G
## 19   19     1   Male      13.6  52031.420      T/T      C/C      G/G      G/G
## 20   20     1 Female      11.7 100761.400      T/T      A/C      G/G      G/G
## 21   21     1   Male      12.1  51512.320      T/T      C/C      G/G      G/G
## 22   22     1   Male      12.9  78904.460      C/C      C/C      G/G      G/G
## 23   23     1   Male      13.4  40523.770      C/T      C/C      G/G      G/G
## 24   24     1 Female      13.1  42556.460      T/T      A/C      G/G      G/G
## 25   25     1 Female      12.7  68609.870      T/T      A/C      G/G      G/G
## 26   26     1 Female      11.3  92728.950      C/T      C/C      G/G      G/G
## 27   27     1   Male      13.8  66205.610      T/T      A/C     <NA>      G/G
## 28   28     1 Female      14.0  92614.200      T/T      A/C      G/G      G/G
## 29   29     1   Male      12.8  24441.413      T/T      A/C      G/G      G/G
## 30   30     1   Male      12.1  34759.020      C/C      C/C      G/G      G/G
## 31   31     1 Female      13.9  49911.300      T/T      A/C      G/G      G/G
## 32   32     1 Female      13.6  54359.180      T/T      A/C      G/G      G/G
## 33   33     1   Male      13.8  80773.230      C/T      A/C      G/G      G/G
## 34   34     1 Female      13.8  36327.250      T/T      A/A     <NA>      G/G
## 35   35     1 Female      12.7  38332.620      C/T      C/C      G/G      G/G
## 36   36     1 Female      13.8   1615.229      C/T      C/C      G/G      G/G
## 37   37     1   Male      11.7   5707.000      C/C      C/C      G/G      G/G
## 38   38     1 Female      12.8  10661.248      C/T      A/C      G/G      G/G
## 39   39     1 Female      11.7  74966.958      C/T      A/C      G/G      G/G
## 40   40     1   Male      12.5 123334.051      T/T      C/C      G/G      G/G
## 41   41     1 Female      11.2  27926.123      T/T      A/C      G/G      G/G
## 42   42     1 Female      12.3  19044.707      T/T      C/C      G/G      G/G
## 43   43     1 Female      12.9  36305.392      T/T      A/C      G/G      G/G
## 44   44     1   Male      12.4  23358.728      C/T      A/C      G/G      G/G
## 45   45     1 Female      12.5  40786.050      T/T      A/C     <NA>      G/G
## 46   46     1   Male      11.9  20215.439      C/C      C/C      G/G      G/G
## 47   47     1   Male      11.7  16585.020      C/T      A/C      G/G      G/G
## 48   48     1 Female      13.0  34140.448      C/T      C/C      G/G      G/G
## 49   49     1 Female      13.1  17387.705      C/T      C/C      G/G      G/G
## 50   50     1 Female      11.0  33860.403      T/T      A/C      G/G      G/G
## 51   51     1   Male      11.9  34660.660      T/T      C/C      G/G      G/G
## 52   52     1   Male      12.3  37781.028      C/T      A/C      G/G      G/G
## 53   53     1   Male      12.1  64065.219      C/T      C/C     <NA>      G/G
## 54   54     1   Male      13.7  30279.832      T/T      C/C      G/G      G/G
## 55   55     1 Female      12.4  54774.460      T/T      A/C      G/G      G/G
## 56   56     1   Male      12.6 104938.404      T/T      C/C      G/G     <NA>
## 57   57     1   Male      13.4  57118.610      T/T      A/C      G/G      G/G
## 58   58     1   Male      13.3  26099.165      T/T      C/C      G/G      G/G
## 59   59     1   Male      13.0  21105.036      C/T      A/C      G/G      G/G
## 60   60     1   Male      12.8  29139.667      T/T      C/C      G/G      G/G
## 61   61     1   Male      12.3  21094.725      T/T      C/C      G/G      G/G
## 62   62     1 Female      13.9  65277.924      T/T      A/C      G/G      G/G
## 63   63     1 Female      12.0  67345.915      T/T      A/C      G/G      G/G
## 64   64     1   Male      12.5  46026.201      T/T      A/C      G/G      G/G
## 65   65     1 Female      12.5  34608.906      T/T      A/C      G/G      G/G
## 66   66     1 Female      13.1  64588.593      T/T      A/A     <NA>      G/G
## 67   67     1   Male      12.7  31814.434      C/C      C/C      G/G      G/G
## 68   68     1   Male      12.4  65195.369      T/T      C/C      G/G      G/G
## 69   69     1   Male      13.1  64530.805      T/T      C/C      G/G      G/G
## 70   70     1 Female      11.9  36309.530      C/T      A/C      G/G      G/G
## 71   71     1 Female      14.2  48915.611      C/T      C/C      G/G      G/G
## 72   72     1 Female      15.2  66875.354      T/T      C/C      G/G      G/G
## 73   73     1   Male      13.8  67572.940      T/T      A/C      G/G      G/G
## 74   74     1   Male      12.9  54682.046      C/T      C/C     <NA>      G/G
## 75   75     1   Male      10.8  31071.443      T/T      A/A     <NA>      G/G
## 76   76     1   Male      13.4  42364.907      T/T      A/C      G/G      G/G
## 77   77     1   Male      15.2  29370.819      T/T      C/C      G/G      G/G
## 78   78     1 Female      11.6  58624.352      T/T      A/A     <NA>      G/G
## 79   79     1 Female      13.8  56557.356      T/T      A/C      G/G      G/G
## 80   80     1   Male      13.2  50196.341      C/T      C/C      G/G      G/G
## 81   81     1   Male      13.9  97563.895      C/T      A/C      G/G      G/G
## 82   82     1   Male      15.2  78234.146      T/T      A/C      G/G      G/G
## 83   83     1   Male      13.0  77400.679      T/T      C/C     <NA>      G/G
## 84   84     1 Female      11.6  62415.960      T/T      A/C      G/G      G/G
## 85   85     1   Male      14.0  50859.770      C/C      C/C      G/G      G/G
## 86   86     1   Male      13.6  31542.004      T/T      C/C      G/G      G/G
## 87   87     1   Male      13.9  14490.361      T/T      A/C      G/G      G/G
## 88   88     1 Female      12.5  29474.013      C/C      C/C      G/G      G/G
## 89   89     1 Female      14.1   8393.596      C/C      C/C      G/G      G/G
## 90   90     1   Male      12.1  93116.519      C/T      C/C      G/G      G/G
## 91   91     1 Female      14.1  66631.818      T/T      C/C      G/G      G/G
## 92   92     1 Female      12.9  60696.146      T/T      A/A     <NA>      G/G
## 93   93     1 Female      13.6  72790.388      C/T      A/C      G/G      G/G
## 94   94     1 Female      11.7  74373.933      T/T      C/C      G/G      G/G
## 95   95     1 Female      14.4  63841.474      T/T      C/C      G/G      G/G
## 96   96     1 Female      13.8  61835.399      T/T      A/C      G/G      G/G
## 97   97     1 Female      11.4  31438.811      T/T      C/C      G/G      G/G
## 98   98     1   Male      12.4  13553.366      T/T      C/C      G/G      G/G
## 99   99     1   Male      13.0  42373.162      C/T      A/C      G/G      G/G
## 100 100     1   Male      11.8  61251.687      T/T      A/C      G/G      G/G
## 101 101     1 Female      11.9  33200.442      C/T      A/C      G/G      G/G
## 102 102     1   Male      11.5  26699.855      C/T      C/C      G/G      G/G
## 103 103     1 Female      14.1  49167.859      C/T      C/C      G/G      G/G
## 104 104     1   Male      13.2  73586.738      T/T      A/C      G/G      G/G
## 105 105     1 Female      13.5  27269.806      T/T      A/C      G/G      G/G
## 106 106     1   Male      15.0  24066.689      C/C      C/C      G/G      G/G
## 107 107     1 Female      12.8  17251.810      C/T      A/C      G/G      G/G
## 108 108     1 Female      13.1  54578.806      T/T      A/C      G/G      G/G
## 109 109     1   Male      12.7  43947.831      T/T      A/C      G/G      G/G
## 110 110     1 Female      15.0   3604.205      C/T      A/C      G/G      G/G
## 111 111     0   Male      14.9  28095.170      T/T      C/C      G/G      G/G
## 112 112     0 Female      13.7  24104.020      C/T      A/C      G/G      G/G
## 113 113     0 Female      12.7  25601.290      C/T      A/C      G/G      G/G
## 114 114     0   Male      11.9  30555.660      T/T      A/C      G/G      G/G
## 115 115     0 Female      13.9  19492.970      C/T      A/C      G/G      G/G
## 116 116     0 Female      12.8  18291.340      C/T      A/C      G/G      G/G
## 117 117     0 Female      14.3  21364.150      T/T      C/C      G/G      G/G
## 118 118     0   Male      13.1  28673.850      T/T      A/C      G/G      G/G
## 119 119     0   Male      13.9  38652.400      C/T      C/C      G/G      G/G
## 120 120     0 Female      13.1  51097.920      C/T      C/C      G/G      G/G
## 121 121     0   Male      12.2  24752.520      C/T      A/C      G/G      G/G
## 122 122     0 Female      12.2  24507.110      T/T      C/C      G/G      G/G
## 123 123     0 Female      12.3  14446.170      C/C      C/C      G/G      G/G
## 124 124     0   Male      13.2  29028.500      T/T      C/C      G/G      G/G
## 125 125     0   Male      13.0  45418.760      T/T      A/C      G/G      G/G
## 126 126     0   Male      13.1  68118.220      T/T      A/C      G/G      G/G
## 127 127     0 Female      13.6  12143.120      T/T      C/C      G/G      G/G
## 128 128     0 Female      14.6  12922.120      T/T      A/C      G/G      G/G
## 129 129     0 Female      14.2  26995.870      C/T      A/C      G/G      G/G
## 130 130     0 Female      13.1  66222.080      T/T      C/C      G/G      G/G
## 131 131     0 Female      15.1   6365.480      C/T      A/C      G/G      G/G
## 132 132     0 Female      13.6  25300.880      C/C      C/C      G/G      G/G
## 133 133     0   Male      11.8  67253.250      T/T      C/C      G/G      G/G
## 134 134     0   Male      12.8  53381.980      C/T      C/C      G/G      G/G
## 135 135     0   Male      12.5  23511.920      C/T      A/C      G/G      G/G
## 136 136     0 Female      12.6  83470.570      T/T      A/C     <NA>      G/G
## 137 137     0   Male      12.3  66114.220      C/T      A/C     <NA>      G/G
## 138 138     0   Male      11.7  53601.330      T/T      C/C      G/G      G/G
## 139 139     0   Male      13.0  42122.400      T/T      C/C      G/G      G/G
## 140 140     0   Male      14.3  41022.250      C/T      C/C      G/G      G/G
## 141 141     0 Female      11.8  50971.030      C/T      C/C      G/G      G/G
## 142 142     0   Male      13.0  58445.560      C/T      C/C      G/G      G/G
## 143 143     0 Female      13.0  81160.350      T/T      C/C      G/G      G/G
## 144 144     0 Female      13.1  28886.720      T/T      A/C      G/G      G/G
## 145 145     0 Female      13.5  70010.040      T/T      A/C      G/G      G/G
## 146 146     0   Male      12.0  63611.930      T/T      C/C      G/G      G/G
## 147 147     0 Female      13.8  72174.240      C/T      A/C      G/G      G/G
## 148 148     0 Female      13.4  58326.910      T/T      C/C      G/G      G/G
## 149 149     0 Female      12.9  41654.300      T/T      A/C      G/G      G/G
## 150 150     0   Male      13.6  39197.310      T/T      A/C     <NA>      G/G
## 151 151     0 Female      14.3  42099.950      T/T      C/C      G/G      G/G
## 152 152     0 Female      14.1  36468.510      C/T      A/C      G/G      G/G
## 153 153     0 Female      12.6  39796.810      T/T      A/C      G/G      G/G
## 154 154     0   Male      13.6  30646.260      C/T      A/C      G/G      G/G
## 155 155     0 Female      12.8  30501.830      C/T      A/C      G/G      G/G
## 156 156     0   Male      11.1  13441.880      C/T      A/C      G/G      G/G
## 157 157     0   Male      13.5  47078.160      C/T      C/C      G/G      G/G
##     snp10005 snp10006 snp10007 snp10008 snp10009 snp100010 snp100011 snp100012
## 1        G/G      A/A      C/C      C/C      A/A       T/T       G/G       G/G
## 2        A/G      A/A      C/C      C/C      A/G       T/T       G/G       C/G
## 3        G/G      A/A      C/C      C/C      A/A       T/T       C/C       G/G
## 4        G/G      A/A      C/C      C/C      A/A       T/T       G/G       G/G
## 5        G/G      A/A      C/C      C/C      A/G       T/T       G/G       G/G
## 6        G/G      A/A      C/C      C/C      A/A       T/T       G/G       G/G
## 7        A/G      A/A      C/C      C/C      A/G       T/T       G/G       C/G
## 8        A/G      A/A      C/C      C/C      A/G       T/T       G/G       C/G
## 9        G/G      A/A      C/C      C/G      A/A       T/T       G/G       G/G
## 10       A/G      A/A      C/C      C/C      A/G       T/T       G/G       C/G
## 11       A/G      A/A      C/C      C/G      A/G       T/T       G/G       C/G
## 12       G/G      A/A      C/C      G/G      A/A       T/T       G/G       G/G
## 13       G/G      A/A      C/C      C/G      A/A       T/T       G/G       G/G
## 14       A/G      A/A      C/C      C/C      A/G       T/T       G/G       C/G
## 15       A/G      A/A      C/C      C/C      A/G       T/T       G/G       C/G
## 16       G/G      A/A      C/C      C/C      A/A       T/T       G/G       G/G
## 17       G/G      A/A      C/C      C/C      A/A       T/T       G/G       G/G
## 18       A/G      A/A      C/C      C/C      A/G       T/T       G/G       C/G
## 19       G/G      A/A      C/C      C/C      A/A       T/T       G/G       G/G
## 20       A/G      A/A      C/C      C/G      A/G       T/T       G/G       C/G
## 21       G/G      A/A      C/C      C/C      A/A       T/T       G/G       G/G
## 22       G/G      A/A      C/C      G/G      A/A       T/T       G/G       G/G
## 23       G/G      A/A      C/C      C/G      A/A       T/T       G/G       G/G
## 24       G/G      A/A      C/C      C/C      A/G       T/T       G/G       G/G
## 25       A/G      A/A      C/C      C/C      A/G       T/T       G/G       C/G
## 26       G/G      A/A      C/C      C/G      A/A       T/T       G/G       G/G
## 27       A/G      A/A      C/C      C/C      A/G       T/T       G/G       C/G
## 28       A/G      A/A      C/C      C/C      A/G       T/T       G/G       C/G
## 29       A/G      A/A      C/C      C/C      A/G       T/T       G/G       C/G
## 30       G/G      A/A      C/C      C/G      A/A       T/T       G/G       G/G
## 31       A/G      A/A      C/C      C/C      A/G       T/T       G/G       C/G
## 32       A/G      A/A      C/C      C/C      A/G       T/T       G/G       C/G
## 33       A/G      A/A      C/C      C/G      A/G       T/T       G/G       C/G
## 34       G/G      A/A      C/C      C/C      G/G       T/T       G/G       G/G
## 35       G/G      A/A      C/C      C/G      A/A       T/T       G/G       G/G
## 36       G/G      A/A      C/C      G/G      A/A       T/T       G/G       G/G
## 37       G/G      A/A      C/C      C/C      A/A       T/T       G/G       G/G
## 38       A/G      A/A      C/C      C/G      A/G       T/T       G/G       C/G
## 39       A/G      A/A      C/C      C/G      A/G       T/T       G/G       C/G
## 40       G/G      A/A      C/C      C/C      A/A       T/T       G/G       G/G
## 41       G/G      A/A      C/C      C/C      A/G       T/T       G/G       G/G
## 42       G/G      A/A      C/C      C/C      A/A       T/T       G/G       G/G
## 43       A/G      A/A      C/C      C/C      A/G       T/T       G/G       C/G
## 44       A/G      A/A      C/C      C/G      A/G       T/T       G/G       C/G
## 45       A/G      A/A      C/C      C/C      A/G       T/T       G/G       C/G
## 46       G/G      A/A      C/C      C/G      A/A       T/T       G/G       G/G
## 47       G/G      A/A      C/C      C/G      A/G       T/T       G/G       G/G
## 48       G/G      A/A      C/C      C/G      A/A       T/T       G/G       G/G
## 49       G/G      A/A      C/C      C/C      A/A       T/T       G/G       G/G
## 50       A/G      A/A      C/C      C/C      A/G       T/T       G/G       C/G
## 51       G/G      A/A      C/C      C/C      A/A       T/T       G/G       G/G
## 52       G/G      A/A      C/C      C/G      A/G       T/T       G/G       G/G
## 53       G/G      A/A      C/C      C/G      A/A       T/T       G/G       G/G
## 54       G/G      A/A      C/C      C/C      A/A       T/T       G/G       G/G
## 55       A/G      A/A      C/C      C/C      A/G       T/T       G/G       C/G
## 56       G/G      A/A      C/C      C/C      A/A       T/T       G/G       G/G
## 57       A/G      A/A      C/C      C/C      A/G       T/T       G/G       C/G
## 58       G/G      A/A      C/C      C/C      A/A       T/T       C/G       G/G
## 59       A/G      A/A      C/C      C/C      A/G       T/T       G/G       C/G
## 60       G/G      A/A      C/C      C/C      A/A       T/T       G/G       G/G
## 61       G/G      A/A      C/C      C/C      A/A       T/T       G/G       G/G
## 62       A/G      A/A      C/C      C/C      A/G       T/T       G/G       C/G
## 63       A/G      A/A      C/C      C/C      A/G       T/T       G/G       C/G
## 64       A/G      A/A      C/C      C/C      A/G       T/T       G/G       C/G
## 65       A/G      A/A      C/C      C/C      A/G       T/T       G/G       C/G
## 66       A/A      A/A      C/C      C/C      G/G       T/T       G/G       C/C
## 67       G/G      A/A      C/C      G/G      A/A       T/T       G/G       G/G
## 68       G/G      A/A      C/C      C/C      A/A       T/T       G/G       G/G
## 69       G/G      A/A      C/C      C/C      A/A       T/T       G/G       G/G
## 70       A/G      A/A      C/C      C/G      A/G       T/T       G/G       C/G
## 71       G/G      A/A      C/C      C/G      A/A       T/T       G/G       G/G
## 72       G/G      A/A      C/C      C/C      A/A       T/T       G/G       G/G
## 73       A/G      A/A      C/C      C/C      A/G       T/T       G/G       C/G
## 74       G/G      A/A      C/C      C/G      A/A       T/T       G/G       G/G
## 75       A/A      A/A      C/C      C/C      G/G       T/T       G/G       C/C
## 76       A/G      A/A      C/C      C/C      A/G       T/T       G/G       C/G
## 77       G/G      A/A      C/C      C/C      A/A       T/T       G/G       G/G
## 78       A/G      A/A      C/C      C/C      G/G       T/T       G/G       C/G
## 79       A/G      A/A      C/C      C/C      A/G       T/T       G/G       C/G
## 80       G/G      A/A      C/C      C/G      A/A       T/T       G/G       G/G
## 81       A/G      A/A      C/C      C/G      A/G       T/T       G/G       C/G
## 82       A/G      A/A      C/C      C/C      A/G       T/T       G/G       C/G
## 83       G/G      A/A      C/C      C/C      A/A       T/T       G/G       G/G
## 84       A/G      A/A      C/C      C/C      A/G       T/T       G/G       C/G
## 85       G/G      A/A      C/C      G/G      A/A       T/T       G/G       G/G
## 86       G/G      A/A      C/C      C/C      A/A       T/T       G/G       G/G
## 87       A/G      A/A      C/C      C/C      A/G       T/T       G/G       C/G
## 88       G/G      A/A      C/C      G/G      A/A       T/T       G/G       G/G
## 89       G/G      A/A      C/C      G/G      A/A       T/T       G/G       G/G
## 90       G/G      A/A      C/C      C/G      A/A       T/T       G/G       G/G
## 91       G/G      A/A      C/C      C/C      A/A       T/T       G/G       G/G
## 92       A/A      A/A      C/C      C/C      G/G       T/T       G/G       C/C
## 93       A/G      A/A      C/C      C/C      A/G       T/T       G/G       C/G
## 94       G/G      A/A      C/C      C/C      A/A       T/T       G/G       G/G
## 95       G/G      A/A      C/C      C/C      A/A       T/T       G/G       G/G
## 96       A/G      A/A      C/C      C/C      A/G       T/T       G/G       C/G
## 97       G/G      A/A      C/C      C/C      A/A       T/T       G/G       G/G
## 98       A/G      A/A      C/C      C/C      A/G       T/T       G/G       C/G
## 99       A/G      A/A      C/C      C/G      A/G       T/T       G/G       C/G
## 100      A/G      A/A      C/C      C/C      A/G       T/T       G/G       C/G
## 101      G/G      A/A      C/C      C/C      A/G       T/T       G/G       G/G
## 102      G/G      A/A      C/C      C/G      A/A       T/T       G/G       G/G
## 103      G/G      A/A      C/C      C/G      A/A       T/T       G/G       G/G
## 104      A/G      A/A      C/C      C/C      A/G       T/T       G/G       C/G
## 105      A/G      A/A      C/C      C/C      A/G       T/T       G/G       C/G
## 106      G/G      A/A      C/C      C/G      A/A       T/T       G/G       G/G
## 107      A/G      A/A      C/C      C/G      A/G       T/T       G/G       C/G
## 108      A/G      A/A      C/C      C/C      A/G       T/T       G/G       C/G
## 109      A/G      A/A      C/C      C/C      A/G       T/T       G/G       C/G
## 110      G/G      A/A      C/C      C/G      A/G       T/T       G/G       G/G
## 111      G/G      A/A      C/C      C/C      A/A      <NA>       G/G       G/G
## 112      A/G      A/A      C/C      C/G      A/G       T/T       G/G       C/G
## 113      A/G      A/A      C/C      C/G      A/G       T/T       G/G       C/G
## 114      A/G      A/A      C/C      C/C      A/G       T/T       G/G       C/G
## 115      A/G      A/A      C/C      C/C      A/G       T/T       G/G       C/G
## 116      A/G      A/A      C/C      C/G      A/G       T/T       G/G       C/G
## 117      G/G      A/A      C/C      C/C      A/A      <NA>       G/G       G/G
## 118      A/G      A/A      C/C      C/C      A/G       T/T       G/G       C/G
## 119      G/G      A/A      C/C      G/G      A/A       T/T       G/G       G/G
## 120      G/G      A/A      C/C      C/G      A/A       T/T       G/G       G/G
## 121      A/G      A/A      C/C      C/G      A/G       T/T       G/G       C/G
## 122      G/G      A/A      C/C      C/C      A/A       T/T       C/G       G/G
## 123      G/G      A/A      C/C      C/G      A/A       T/T       G/G       G/G
## 124      G/G      A/A      C/C      C/C      A/A      <NA>       G/G       G/G
## 125      G/G      A/A      C/C      C/C      A/G       T/T       G/G       G/G
## 126      A/G      A/A      C/C      C/C      A/G       T/T       G/G       C/G
## 127      G/G      A/A      C/C      C/C      A/A      <NA>       G/G       G/G
## 128      A/G      A/A      C/C      C/C      A/G       T/T       G/G       C/G
## 129      G/G      A/A      C/C      C/G      A/G       T/T       G/G       G/G
## 130      G/G      A/A      C/C      C/C      A/A      <NA>       G/G       G/G
## 131      A/G      A/A      C/C      C/C      A/G       T/T       G/G       C/G
## 132      G/G      A/A      C/C      G/G      A/A       T/T       G/G       G/G
## 133      G/G      A/A      C/C      C/C      A/A      <NA>       G/G       G/G
## 134      G/G      A/A      C/C      C/G      A/A       T/T       G/G       G/G
## 135      A/G      A/A      C/C      C/C      A/G       T/T       G/G       C/G
## 136      A/G      A/A      C/C      C/C      A/G       T/T       G/G      <NA>
## 137      A/G      A/A      C/C      C/G     <NA>       T/T       G/G      <NA>
## 138      G/G      A/A      C/C      C/C      A/A      <NA>       G/G       G/G
## 139      G/G      A/A      C/C      C/C      A/A      <NA>       G/G       G/G
## 140      G/G      A/A      C/C      C/G      A/A       T/T       G/G       G/G
## 141      G/G      A/A      C/C      C/G      A/A       T/T       G/G       G/G
## 142      G/G      A/A      C/C      C/G      A/A       T/T       G/G       G/G
## 143      G/G      A/A      C/C      C/C      A/A       T/T       G/G       G/G
## 144      A/G      A/A      C/C      C/C      A/G       T/T       G/G       C/G
## 145      A/G      A/A      C/C      C/C      A/G       T/T       G/G       C/G
## 146      G/G      A/A      C/C      C/G      A/A       T/T       G/G       G/G
## 147      A/G      A/A      C/C      C/C      A/G       T/T       G/G       C/G
## 148      G/G      A/A      C/C      C/C      A/A      <NA>       G/G       G/G
## 149      G/G      A/A      C/C      C/C      A/G       T/T       G/G       G/G
## 150      G/G      A/A      C/C      C/C      A/G       T/T       G/G       G/G
## 151      G/G      A/A      C/C      C/C      A/A      <NA>       G/G       G/G
## 152      A/G      A/A      C/C      C/G      A/G       T/T       G/G       C/G
## 153      A/G      A/A      C/C      C/C      A/G       T/T       G/G       C/G
## 154      A/G      A/A      C/C      C/C      A/G       T/T       G/G       C/G
## 155      A/G      A/A      C/C      C/C      A/G       T/T       G/G       C/G
## 156      A/G      A/A      C/C      C/G      A/G       T/T       G/G       C/G
## 157      A/G      A/A      C/C      C/C      A/G       T/T       G/G       C/G
##     snp100013 snp100014 snp100015 snp100016 snp100017 snp100018 snp100019
## 1         A/A       A/A       G/G       G/G       T/T       T/T       C/C
## 2         A/A       A/C       G/G       G/G       C/T       C/T       C/G
## 3         A/A       C/C       G/G       G/G       T/T       T/T       C/C
## 4         A/A       A/C       G/G       G/G       T/T       T/T       C/G
## 5         A/A       A/C       G/G       G/G       C/T       C/T       C/G
## 6         A/A       A/A       G/G       G/G       T/T       T/T       C/C
## 7         A/A       A/C       G/G       G/G       C/T       C/T       C/G
## 8         A/A       A/C       G/G       G/G       C/T       C/T       C/G
## 9         A/G       A/C       G/G       G/G       T/T       T/T       C/G
## 10        A/A       A/C       G/G       G/G       C/T       C/T       C/G
## 11        A/G       C/C       G/G       G/G       C/T       C/T       G/G
## 12        G/G       C/C       G/G       G/G       T/T       T/T       G/G
## 13        A/G       A/C       G/G       G/G       T/T       T/T       C/G
## 14        A/A       A/C       G/G       G/G       C/T       C/T       C/G
## 15        A/A       A/C       G/G       G/G       C/T       C/T       C/G
## 16        A/A       A/A       G/G       G/G       T/T       T/T       C/C
## 17        A/A       A/C       G/G       G/G       T/T       T/T       C/G
## 18        A/A       A/C       G/G       G/G       C/T       C/T       C/G
## 19        A/A       A/A       G/G       G/G       T/T       T/T       C/C
## 20       <NA>       C/C       G/G       G/G       C/T       C/T       G/G
## 21        A/A       A/A       G/G       G/G       T/T       T/T       C/C
## 22        G/G       C/C       G/G       G/G       T/T       T/T       G/G
## 23       <NA>       A/C       G/G       G/G       T/T       T/T       C/G
## 24        A/A       A/C       G/G       G/G       C/T       C/T       C/G
## 25        A/A       A/C       G/G       G/G       C/T       C/T       C/G
## 26        A/G       A/C       A/G       G/G       T/T       T/T       C/G
## 27        A/A       A/C       A/G       G/G       C/T       C/T       C/G
## 28        A/A       C/C       G/G       G/G       C/T       C/T       C/G
## 29        A/A       A/C       G/G       G/G       C/T       C/T       C/G
## 30       <NA>       C/C       G/G       G/G       T/T       T/T       G/G
## 31        A/A       C/C       G/G       G/G       C/T       C/T       C/G
## 32        A/A       A/C       G/G       G/G       C/T       C/T       C/G
## 33       <NA>       C/C       G/G       G/G       C/T       C/T       G/G
## 34        A/A       C/C       G/G       G/G       C/C       C/C       G/G
## 35        A/G       A/C       G/G       G/G       T/T       T/T       C/G
## 36        G/G       C/C       G/G       G/G       T/T       T/T       G/G
## 37        A/A       C/C       G/G       G/G       T/T       T/T       G/G
## 38        A/G       C/C       G/G       G/G       C/T       C/T       G/G
## 39        A/G       C/C       G/G       G/G       C/T       C/T       G/G
## 40        A/A       A/A       A/G       G/G       T/T       T/T       C/C
## 41        A/A       A/C       G/G       G/G       C/T       C/T       C/G
## 42        A/A       A/A       A/G       G/G       T/T       T/T       C/C
## 43        A/A       A/C       G/G       G/G       C/T       C/T       C/G
## 44        A/G       C/C       G/G       G/G       C/T       C/T       G/G
## 45        A/A       A/C       A/G       G/G       C/T       C/T       C/G
## 46        A/G       C/C       G/G       G/G       T/T       T/T       G/G
## 47        A/G       C/C       G/G       G/G       C/T       C/T       G/G
## 48        A/G       A/C       G/G       G/G       T/T       T/T       C/G
## 49        A/A       A/C       G/G       G/G       T/T       T/T       C/G
## 50        A/A       A/C       G/G       G/G       C/T       C/T       C/G
## 51        A/A       A/A       G/G       G/G       T/T       T/T       C/C
## 52        A/G       C/C       G/G       G/G       C/T       C/T       G/G
## 53       <NA>       A/C       G/G      <NA>       T/T       T/T       C/G
## 54        A/A       A/C       G/G       G/G       C/T       C/T       C/G
## 55        A/A       A/C       G/G       G/G       C/T       C/T       C/G
## 56        A/A       A/A       G/G       G/G       T/T       T/T       C/C
## 57        A/A       A/C       G/G       G/G       C/T       C/T       C/G
## 58        A/A       A/C       G/G       G/G       T/T       T/T       C/C
## 59        A/A       C/C       G/G       G/G       C/T       C/T       G/G
## 60        A/A       A/A       G/G       G/G       T/T       T/T       C/C
## 61        A/A       A/A       G/G       G/G       T/T       T/T       C/C
## 62        A/A       A/C       A/G       G/G       C/T       C/T       C/G
## 63        A/A       A/C       G/G       G/G       C/T       C/T       C/G
## 64        A/A       A/C       G/G       G/G       C/T       C/T       C/G
## 65        A/A       A/C       G/G       G/G       C/T       C/T       C/G
## 66        A/A       C/C       G/G       G/G       C/C       C/C       G/G
## 67        G/G       C/C       G/G       G/G       T/T       T/T       G/G
## 68        A/A       A/A       A/G       G/G       T/T       T/T       C/C
## 69        A/A       A/C       G/G       G/G       C/T       C/T       C/G
## 70        A/G       C/C       G/G       G/G       C/T       C/T       G/G
## 71        A/G       A/C       G/G       G/G       T/T       T/T       C/G
## 72        A/A       A/A       G/G       G/G       T/T       T/T       C/C
## 73        A/A       A/C       G/G       G/G       C/T       C/T       C/G
## 74       <NA>       A/C       G/G      <NA>       T/T       T/T       C/G
## 75       <NA>       C/C       G/G      <NA>       C/C       C/C       G/G
## 76        A/A       A/C       A/G       G/G       C/T       C/T       C/G
## 77        A/A       A/C       G/G       G/G       C/T       C/T       C/G
## 78       <NA>       C/C       G/G      <NA>       C/C       C/C       G/G
## 79        A/A       A/C       G/G       G/G       C/T       C/T       C/G
## 80        A/G       A/C       G/G       G/G       T/T       T/T       C/G
## 81        A/G       C/C       G/G       G/G       C/T       C/T       G/G
## 82        A/A       A/C       G/G       G/G       C/T       C/T       C/G
## 83       <NA>      <NA>       G/G      <NA>       T/T       T/T       C/C
## 84        A/A       A/C       A/G       G/G       C/T       C/T       C/G
## 85        G/G       C/C       G/G       G/G       T/T       T/T       G/G
## 86        A/A       A/A       G/G       G/G       T/T       T/T       C/C
## 87        A/A       A/C       G/G       G/G       C/T       C/T       C/G
## 88        G/G       C/C       G/G       G/G       T/T       T/T       G/G
## 89        G/G       C/C       G/G       G/G       T/T       T/T       G/G
## 90        A/G       A/C       G/G       G/G       T/T       T/T       C/G
## 91        A/A       A/A       G/G       G/G       T/T       T/T       C/C
## 92        A/A       C/C       G/G       G/G       C/C       C/C       G/G
## 93        A/A       C/C       G/G       G/G       C/T       C/T       G/G
## 94        A/A       A/A       G/G       G/G       T/T       T/T       C/C
## 95        A/A       A/A       G/G       G/G       T/T       T/T       C/C
## 96        A/A       A/C       G/G       G/G       C/T       C/T       C/G
## 97        A/A       A/A       G/G       G/G       T/T       T/T       C/C
## 98        A/A       A/C       G/G       G/G       C/T       C/T       C/G
## 99       <NA>       C/C       G/G       G/G       C/T       C/T       G/G
## 100       A/A       A/C       G/G       G/G       C/T       C/T       C/G
## 101       A/A       C/C       G/G       G/G       C/T       C/T       G/G
## 102       A/G       A/C       G/G       G/G       T/T       T/T       C/G
## 103       A/G       A/C       G/G       G/G       T/T       T/T       C/G
## 104       A/A       A/C       G/G       G/G       C/T       C/T       C/G
## 105       A/A       A/C       G/G       G/G       C/T       C/T       C/G
## 106       A/G       C/C       G/G       G/G       T/T       T/T       G/G
## 107      <NA>       C/C       G/G       G/G       C/T       C/T       G/G
## 108       A/A       A/C       G/G       G/G       C/T       C/T       C/G
## 109       A/A       A/C       G/G       G/G       C/T       C/T       C/G
## 110       A/G       C/C       G/G       G/G       C/T       C/T       G/G
## 111       A/A       A/A       G/G       G/G       T/T       T/T       C/C
## 112       A/G       C/C       G/G       G/G       C/T       C/T       G/G
## 113       A/G       C/C       G/G       G/G       C/T       C/T       G/G
## 114       A/A       A/C       G/G       G/G       C/T       C/T       C/G
## 115       A/A       C/C       G/G       G/G       C/T       C/T       G/G
## 116       A/G       C/C       G/G       G/G       C/T       C/T       G/G
## 117       A/A       A/A       G/G       G/G       T/T       T/T       C/C
## 118       A/A       A/C       G/G       G/G       C/T       C/T       C/G
## 119       G/G       C/C       G/G       G/G       T/T       T/T       G/G
## 120       A/G       C/C       G/G       G/G       C/T       C/T       G/G
## 121       A/G       C/C       G/G       G/G       C/T       C/T       G/G
## 122       A/A       A/C       G/G       G/G       T/T       T/T       C/C
## 123       A/G       C/C       G/G       G/G       T/T       T/T       G/G
## 124       A/A       A/A       A/G       G/G       T/T       T/T       C/C
## 125       A/A       A/C       G/G       G/G       C/T       C/T       C/G
## 126       A/A       A/C       G/G       G/G       C/T       C/T       C/G
## 127       A/A       A/A       A/G       G/G       T/T       T/T       C/C
## 128       A/A       A/C       G/G       G/G       C/T       C/T       C/G
## 129       A/G       C/C       G/G       G/G       C/T       C/T       G/G
## 130       A/A       A/A       G/G       G/G       T/T       T/T       C/C
## 131       A/A       C/C       G/G       G/G       C/T       C/T       G/G
## 132       G/G       C/C       G/G       G/G       T/T       T/T       G/G
## 133       A/A       A/A       A/G       G/G       T/T       T/T       C/C
## 134       A/G       A/C       G/G       G/G       T/T       T/T       C/G
## 135       A/A       C/C       G/G       G/G       C/T       C/T       G/G
## 136       A/A       A/C       G/G       G/G      <NA>       C/T       C/G
## 137      <NA>      <NA>       G/G       G/G      <NA>      <NA>       G/G
## 138       A/A       A/A       A/G       G/G       T/T       T/T       C/C
## 139       A/A       A/C       G/G       G/G       C/T       C/T       C/G
## 140       A/G       A/C       G/G       G/G       T/T       T/T       C/G
## 141       A/G       A/C       G/G       G/G       T/T       T/T       C/G
## 142       A/G      <NA>       G/G       G/G       T/T       T/T       C/G
## 143       A/A       A/C       G/G       G/G       T/T       T/T       C/C
## 144       A/A       A/C       G/G       G/G       C/T       C/T       C/G
## 145       A/A       A/C       G/G       G/G       C/T       C/T       C/G
## 146       A/G       A/C       G/G       G/G       T/T       T/T       C/G
## 147       A/A       C/C       G/G       G/G       C/T       C/T       G/G
## 148       A/A       A/A       G/G       G/G       T/T       T/T       C/C
## 149       A/A       A/C       G/G       G/G       C/T       C/T       C/G
## 150       A/A      <NA>       G/G       G/G       C/T       C/T       C/G
## 151       A/A       A/A       G/G       G/G       T/T       T/T       C/C
## 152       A/G       C/C       G/G       G/G       C/T       C/T       G/G
## 153       A/A       A/C       G/G       G/G       C/T       C/T       C/G
## 154       A/A       C/C       G/G       G/G       C/T       C/T       G/G
## 155       A/A       C/C       G/G       G/G       C/T       C/T       G/G
## 156       A/G       C/C       G/G       G/G       C/T       C/T       G/G
## 157       A/A       A/C       G/G       G/G       C/T       C/T       C/G
##     snp100020 snp100021 snp100022 snp100023 snp100024 snp100025 snp100026
## 1         G/G       G/G       A/A       T/T       T/T       C/C       G/G
## 2         G/G       G/G       A/A       A/T       T/T       C/C       G/G
## 3         G/G       G/G       A/A       T/T       T/T       C/C       G/G
## 4         G/G       G/G       A/A       T/T       C/T       C/C       G/G
## 5         G/G       G/G       A/A       A/T       T/T       C/C       G/G
## 6         G/G       G/G       A/A       T/T       T/T       C/C       G/G
## 7         G/G       G/G       A/A       A/T       T/T       C/C       G/G
## 8         G/G       G/G       A/A       A/T       T/T       C/C       G/G
## 9         A/G       G/G       A/A       T/T       C/T       C/C       G/G
## 10        G/G       G/G       A/A       A/T       T/T       C/C       G/G
## 11        A/G       G/G       A/A       A/T       C/T       C/C       G/G
## 12        A/A       G/G       A/A       T/T       C/C       C/C       G/G
## 13        A/G       G/G       A/A       T/T       C/T       C/C       G/G
## 14        G/G       G/G       A/A       A/T       T/T       C/C       G/G
## 15        G/G       G/G       A/A       A/T       T/T       C/C       G/G
## 16        G/G       G/G       A/A       T/T       T/T       C/C       G/G
## 17        G/G       G/G       A/A       T/T       C/T       C/C       G/G
## 18        G/G       G/G       A/A       A/T       T/T       C/C       G/G
## 19        G/G       G/G       A/A       T/T       T/T       C/C       G/G
## 20        A/G       G/G      <NA>       A/T       C/T       C/C       G/G
## 21        G/G       G/G       A/A       T/T       T/T       C/C       G/G
## 22        A/A       G/G       A/A       T/T       C/C       C/C       G/G
## 23        A/G       G/G       A/A       T/T       C/T       C/C       G/G
## 24        G/G       G/G       A/A       A/T       T/T       C/C       G/G
## 25        G/G       G/G       A/A       A/T       T/T       C/C       G/G
## 26        A/G       G/G       A/A       T/T       C/T       C/C       G/G
## 27        G/G       G/G       A/A       A/T       T/T       C/C       G/G
## 28        G/G       G/G       A/A       A/T       T/T       C/C       G/G
## 29        G/G       G/G       A/A       A/T       T/T       C/C       G/G
## 30        A/G       G/G       A/A       T/T       C/C       C/C       G/G
## 31        G/G       G/G       A/A       A/T       T/T       C/C       G/G
## 32        G/G       G/G       A/A       A/T       T/T       C/C       G/G
## 33        A/G       G/G       A/A       A/T       C/T       C/C       G/G
## 34        G/G       G/G       A/A       A/A       T/T       C/C       G/G
## 35        G/G       G/G       A/A       T/T       C/T       C/C       G/G
## 36        A/A       G/G       A/A       T/T       C/C       C/C       G/G
## 37        G/G       G/G       A/A       T/T       C/C       C/C       G/G
## 38        A/G       G/G       A/A       A/T       C/T       C/C       G/G
## 39        A/G       G/G       A/A       A/T       C/T       C/C       G/G
## 40        G/G       G/G       A/A       T/T       T/T       C/C       G/G
## 41        G/G       G/G       A/A       A/T       T/T       C/C       G/G
## 42        G/G       G/G       A/A       T/T       T/T       C/C       G/G
## 43        G/G       G/G       A/A       A/T       T/T       C/C       G/G
## 44        A/G       G/G       A/A       A/T       C/T       C/C       G/G
## 45        G/G       G/G       A/A      <NA>       T/T       C/C       G/G
## 46        A/G       G/G       A/A       T/T       C/C       C/C       G/G
## 47        A/G       G/G       A/A       A/T       C/T       C/C       G/G
## 48        A/G       G/G       A/A       T/T       C/T       C/C       G/G
## 49        G/G       G/G       A/A       T/T       C/T       C/C       G/G
## 50        G/G       G/G       A/A       A/T       T/T       C/C       G/G
## 51        G/G       G/G       A/A       T/T       T/T       C/C       G/G
## 52        A/G       G/G       A/A       A/T       C/T       C/C       G/G
## 53        A/G       G/G       A/A       T/T       C/T       C/C       G/G
## 54        G/G       G/G       A/A       T/T       T/T       C/C       G/G
## 55        G/G       G/G       A/A       A/T       T/T       C/C       G/G
## 56        G/G       G/G       A/A       T/T       T/T       C/C       G/G
## 57        G/G       G/G       A/A       A/T       T/T       C/C       G/G
## 58        G/G       G/G       A/A       T/T       T/T       C/C       G/G
## 59        G/G       G/G       A/A       A/T       C/T       C/C       G/G
## 60        G/G       G/G       A/A       T/T       T/T       C/C       G/G
## 61        G/G       G/G       A/A       T/T       T/T       C/C       G/G
## 62        G/G       G/G       A/A       A/T       T/T       C/C       G/G
## 63        G/G       G/G       A/A       A/T       T/T       C/C       G/G
## 64        G/G       G/G       A/A       A/T       T/T       C/C       G/G
## 65        G/G       G/G       A/A       A/T       T/T       C/C       G/G
## 66        G/G       G/G       A/A       A/A       T/T       C/C       G/G
## 67        A/A       G/G       A/A       T/T       C/C       C/C       G/G
## 68        G/G       G/G       A/A       T/T       T/T       C/C       G/G
## 69        G/G       G/G       A/A       A/T       T/T       C/C       G/G
## 70        A/G       G/G       A/A       A/T       C/T       C/C       G/G
## 71        A/G       G/G       A/A       T/T       C/T       C/C       G/G
## 72        G/G       G/G       A/A       T/T       T/T       C/C       G/G
## 73        G/G       G/G       A/A       A/T       T/T       C/C       G/G
## 74        A/G       G/G       A/A       T/T       C/T       C/C       G/G
## 75        G/G       G/G       A/A       A/A       T/T       C/C       G/G
## 76        G/G       G/G       A/A       A/T       T/T       C/C       G/G
## 77        G/G       G/G       A/A       T/T       T/T       C/C       G/G
## 78        G/G       G/G       A/A       A/A       T/T       C/C       G/G
## 79        G/G       G/G       A/A       A/T       T/T       C/C       G/G
## 80        A/G       G/G       A/A       T/T       C/T       C/C       G/G
## 81        A/G       G/G       A/A       A/T       C/T       C/C       G/G
## 82        G/G       G/G       A/A       A/T       T/T       C/C       G/G
## 83        G/G       G/G       A/A       T/T       T/T       C/C       G/G
## 84        G/G       G/G       A/A       A/T       T/T       C/C       G/G
## 85        A/A       G/G       A/A       T/T       C/C       C/C       G/G
## 86        G/G       G/G       A/A       T/T       T/T       C/C       G/G
## 87        G/G       G/G       A/A       A/T       T/T       C/C       G/G
## 88        A/A       G/G       A/A       T/T       C/C       C/C       G/G
## 89        A/A       G/G       A/A       T/T       C/C       C/C       G/G
## 90        A/G       G/G       A/A       T/T       C/T       C/C       G/G
## 91        G/G       G/G       A/A       T/T       T/T       C/C       G/G
## 92        G/G       G/G       A/A       A/A       T/T       C/C       G/G
## 93        G/G       G/G       A/A       A/T       C/T       C/C       G/G
## 94        G/G       G/G       A/A       T/T       T/T       C/C       G/G
## 95        G/G       G/G       A/A       T/T       T/T       C/C       G/G
## 96        G/G       G/G       A/A       A/T       T/T       C/C       G/G
## 97        G/G       G/G       A/A       T/T       T/T       C/C       G/G
## 98        G/G       G/G       A/A       A/T       T/T       C/C       G/G
## 99        A/G       G/G       A/A      <NA>       C/T       C/C       G/G
## 100       G/G       G/G       A/A       A/T       T/T       C/C       G/G
## 101       G/G       G/G       A/A       A/T       C/T       C/C       G/G
## 102       A/G       G/G       A/A       T/T       C/T       C/C       G/G
## 103       A/G       G/G       A/A       T/T       C/T       C/C       G/G
## 104       G/G       G/G       A/A       A/T       T/T       C/C       G/G
## 105       G/G       G/G       A/A       A/T       T/T       C/C       G/G
## 106       A/G       G/G       A/A       T/T       C/C       C/C       G/G
## 107       A/G       G/G       A/A       A/T       C/T       C/C       G/G
## 108       G/G       G/G       A/A       A/T       T/T       C/C       G/G
## 109       G/G       G/G       A/A       A/T       T/T       C/C       G/G
## 110       A/G       G/G       A/A       A/T       C/T       C/C       G/G
## 111       G/G       G/G       A/A       T/T       T/T       C/C       G/G
## 112       A/G       G/G       A/A       A/T       C/T       C/C       G/G
## 113       A/G       G/G       A/A       A/T       C/T       C/C       G/G
## 114       G/G       G/G       A/A       A/T       T/T       C/C       G/G
## 115       G/G       G/G       A/A       A/T       C/T       C/C       G/G
## 116       A/G       G/G       A/A       A/T       C/T       C/C       G/G
## 117       G/G       G/G       A/A       T/T       T/T       C/C       G/G
## 118       G/G       G/G       A/A       A/T       T/T       C/C       G/G
## 119       A/A       G/G       A/A       T/T       C/C       C/C       G/G
## 120       A/G       G/G       A/A       T/T       C/T       C/C       G/G
## 121       A/G       G/G       A/A       A/T       C/T       C/C       G/G
## 122       G/G       G/G       A/A       T/T       T/T       C/C       G/G
## 123       A/G       G/G       A/A       T/T       C/C       C/C       G/G
## 124       G/G       G/G       A/A       T/T       T/T       C/C       G/G
## 125       G/G       G/G       A/A       A/T       T/T       C/C       G/G
## 126       G/G       G/G       A/A       A/T       T/T       C/C       G/G
## 127       G/G       G/G       A/A       T/T       T/T       C/C       G/G
## 128       G/G       G/G       A/A       A/T       T/T       C/C       G/G
## 129       A/G       G/G       A/A       A/T       C/T       C/C       G/G
## 130       G/G       G/G       A/A       T/T       T/T       C/C       G/G
## 131       G/G       G/G       A/A       A/T       C/T       C/C       G/G
## 132       A/A       G/G       A/A       T/T       C/C       C/C       G/G
## 133       G/G       G/G       A/A       T/T       T/T       C/C       G/G
## 134       A/G       G/G       A/A       T/T       C/T       C/C       G/G
## 135       G/G       G/G       A/A       A/T       C/T       C/C       G/G
## 136       G/G       G/G       A/A       A/T       T/T       C/C       G/G
## 137       A/G       G/G       A/A      <NA>      <NA>       C/C      <NA>
## 138       G/G       G/G       A/A       T/T       T/T       C/C       G/G
## 139       G/G       G/G       A/A       T/T       T/T       C/C       G/G
## 140       A/G       G/G       A/A       T/T       C/T       C/C       G/G
## 141       A/G       G/G       A/A       T/T       C/T       C/C       G/G
## 142       A/G       G/G       A/A       T/T       C/T       C/C       G/G
## 143       G/G       G/G       A/A       T/T       T/T       C/C       G/G
## 144       G/G       G/G       A/A       A/T       T/T       C/C       G/G
## 145       G/G       G/G       A/A       A/T       T/T       C/C       G/G
## 146       A/G       G/G       A/A       T/T       C/T       C/C       G/G
## 147       G/G       G/G       A/A       A/T       C/T       C/C       G/G
## 148       G/G       G/G       A/A       T/T       T/T       C/C       G/G
## 149       G/G       G/G       A/A       A/T       T/T       C/C       G/G
## 150       G/G       G/G       A/A       A/T       T/T       C/C       G/G
## 151       G/G       G/G       A/A       T/T       T/T       C/C       G/G
## 152       A/G       G/G       A/A       A/T       C/T       C/C       G/G
## 153       G/G       G/G       A/A       A/T       T/T       C/C       G/G
## 154       G/G       G/G       A/A       A/T       C/T       C/C       G/G
## 155       G/G       G/G       A/A       A/T       C/T       C/C       G/G
## 156       A/G       G/G       A/A       A/T       C/T       C/C       G/G
## 157       G/G       G/G       A/A       A/T       T/T       C/C       G/G
##     snp100027 snp100028 snp100029 snp100030 snp100031 snp100032 snp100033
## 1         C/C       C/C       G/G       A/A       T/T       A/A       A/A
## 2         C/G       C/T       G/G       A/A       T/T       A/G       A/G
## 3         C/C       C/C       G/G       A/A       T/T       A/A       A/A
## 4         C/C       C/T       A/G       A/A       T/T       A/G       A/G
## 5         C/G       C/T       G/G       A/A       T/T       A/G       A/G
## 6         C/C       C/C       G/G       A/A       T/T       A/A       A/A
## 7         C/G       C/T       G/G       A/A       T/T      <NA>       A/G
## 8         C/G       C/T       G/G       A/A       T/T       A/G       A/G
## 9         C/C       C/T       A/G       A/A       T/T       A/G       A/G
## 10        C/G       C/T       G/G       A/A       T/T       A/G       A/G
## 11        C/G       T/T       A/G       A/A       T/T       G/G       G/G
## 12        C/C       T/T       A/A       A/A       T/T       G/G       G/G
## 13        C/C       C/T       A/G       A/A       T/T       A/G       A/G
## 14        C/G       C/T       G/G       A/A       T/T       A/G       A/G
## 15        C/G       C/T       G/G       A/A       T/T       A/G       A/G
## 16        C/C       C/C       G/G       A/A       T/T       A/A       A/A
## 17        C/C       C/T       A/G       A/A       T/T       A/G       A/G
## 18        G/G       T/T       G/G       A/A       T/T       G/G       G/G
## 19        C/C       C/C       G/G       A/A       T/T       A/A       A/A
## 20        C/G       T/T       A/G       A/A       T/T       G/G       G/G
## 21        C/C       C/C       G/G       A/A       T/T       A/A       A/A
## 22        C/C       T/T       A/A       A/A       T/T       G/G       G/G
## 23        C/C       C/T       A/G       A/A       T/T       A/G       A/G
## 24        C/G       C/T       G/G       A/A       T/T       A/G       A/G
## 25        C/C       C/C       G/G       A/A       T/T       A/A       A/A
## 26        C/C       C/T       A/G       A/A       T/T       A/G       A/G
## 27        C/G       C/T       G/G       A/A       T/T       A/G       A/G
## 28        C/G       C/T       G/G       A/A       T/T       A/G       A/G
## 29        C/G       C/T       G/G       A/A       T/T       A/G       A/G
## 30        C/C       T/T       A/A       A/A       T/T       G/G       G/G
## 31        C/G       C/T       G/G       A/A       T/T       A/G       A/G
## 32        C/G       C/T       G/G       A/A       T/T       A/G       A/G
## 33        C/G       T/T       A/G       A/A       T/T       G/G       G/G
## 34        G/G       T/T       G/G       A/A       T/T       G/G       G/G
## 35        C/C       C/T       G/G       A/A       T/T       A/G       A/G
## 36        C/C       T/T       A/A       A/A       T/T       G/G       G/G
## 37        C/C       T/T       A/A       A/A       T/T       G/G       G/G
## 38        C/G       T/T       A/G       A/A       T/T       G/G       G/G
## 39        C/G       T/T       A/G       A/A       T/T       G/G       G/G
## 40        C/C       C/C       G/G       A/A       T/T       A/A       A/A
## 41        C/G       C/T       G/G       A/A       T/T       A/G       A/G
## 42        C/C       C/C       G/G       A/A       T/T       A/A       A/A
## 43        C/G       C/T       G/G       A/A       T/T       A/G       A/G
## 44        C/G       T/T       A/G       A/A       T/T       G/G       G/G
## 45        C/G       C/T       G/G       A/A       T/T       A/G       A/G
## 46        C/C       T/T       A/A       A/A       T/T       G/G       G/G
## 47        C/G       T/T       A/G       A/A       T/T       G/G       G/G
## 48        C/C       C/T       A/G       A/A       T/T       A/G       A/G
## 49        C/C       C/T       A/G       A/A       T/T       A/G       A/G
## 50        C/G       C/T       G/G       A/A       T/T       A/G       A/G
## 51        C/C       C/C       G/G       A/A       T/T       A/A       A/A
## 52        C/G       T/T       A/G       A/A       T/T       G/G       G/G
## 53        C/C       C/T       A/G       A/A      <NA>       G/G       A/G
## 54        C/G       C/T       G/G       A/A       T/T       A/G       A/G
## 55        C/G       C/T       G/G       A/A       T/T       A/G       A/G
## 56        C/C       C/C       G/G       A/A       T/T       A/A       A/A
## 57        C/G       C/T       G/G       A/A       T/T       A/G      <NA>
## 58        C/C       C/C       G/G       A/A       T/T       A/A       A/A
## 59        C/G       T/T       A/G       A/A       T/T       G/G       G/G
## 60        C/C       C/C       G/G       A/A       T/T       A/A       A/A
## 61        C/C       C/C       G/G       A/A       T/T       A/A       A/A
## 62        C/G       C/T       G/G       A/A       T/T       A/G       A/G
## 63        C/G       C/T       G/G       A/A       T/T       A/G       A/G
## 64        C/G       C/T       G/G       A/A       T/T       A/G       A/G
## 65        C/G       C/T       G/G       A/A       T/T       A/G       A/G
## 66        G/G       T/T       G/G       A/A      <NA>       G/G       G/G
## 67        C/C       T/T       A/A       A/A       T/T       G/G       G/G
## 68        C/C       C/C       G/G       A/A       T/T       A/A       A/A
## 69        C/G       C/T       G/G       A/A       T/T       A/G       A/G
## 70        C/G       T/T       A/G       A/A       T/T       G/G       G/G
## 71        C/C       C/T       A/G       A/A       T/T       A/G       A/G
## 72        C/C       C/C       G/G       A/A       T/T       A/A       A/A
## 73        C/G       C/T       G/G       A/A      <NA>       A/G       A/G
## 74        C/C       C/T       A/G       A/A      <NA>       A/G      <NA>
## 75        G/G       T/T       G/G       A/A      <NA>       G/G       G/G
## 76        C/G       C/T       G/G       A/A       T/T       A/G       A/G
## 77        C/G       C/T       G/G       A/A       T/T       A/G       A/G
## 78       <NA>       T/T       G/G       A/A      <NA>       G/G       G/G
## 79        C/G       C/T       G/G       A/A       T/T       A/G       A/G
## 80        C/C       C/T       A/G       A/A       T/T       A/G       A/G
## 81        C/G       T/T       A/G       A/A      <NA>       G/G       G/G
## 82        C/G       C/T       G/G       A/A       T/T       A/G       A/G
## 83        C/C       C/C       G/G       A/A      <NA>       A/A       A/A
## 84        C/G       C/T       G/G       A/A       T/T       A/G       A/G
## 85        C/C       T/T       A/A       A/A       T/T       G/G       G/G
## 86        C/C       C/C       G/G       A/A       T/T       A/A       A/A
## 87        C/G       C/T       G/G       A/A       T/T       A/G       A/G
## 88        C/C       T/T       A/A       A/A       T/T       G/G       G/G
## 89        C/C       T/T       A/A       A/A       T/T       G/G       G/G
## 90        C/C       C/T       A/G       A/A       T/T       A/G       A/G
## 91        C/C       C/C       G/G       A/A       T/T       A/A       A/A
## 92        G/G       T/T       G/G       A/A       T/T       G/G       G/G
## 93        C/G       T/T       A/G       A/A       T/T       G/G       G/G
## 94        C/C       C/C       G/G       A/A       T/T       A/A       A/A
## 95        C/C       C/C       G/G       A/A       T/T       A/A       A/A
## 96        C/G       C/T       G/G       A/A       T/T       A/G       A/G
## 97        C/C       C/C       G/G       A/A       T/T       A/A       A/A
## 98        C/G       C/T       G/G       A/A       T/T       A/G       A/G
## 99        C/G      <NA>       A/G       A/A       T/T       G/G      <NA>
## 100       C/G       C/T       G/G       A/A       T/T       A/G       A/G
## 101       C/G       T/T       A/G       A/A       T/T       G/G       G/G
## 102       C/C       C/T       A/G       A/A       T/T       A/G       A/G
## 103       C/C       C/T       A/G       A/A       T/T       A/G       A/G
## 104       C/G       C/T       G/G       A/A       T/T       A/G       A/G
## 105       C/G       C/T       G/G       A/A       T/T       A/G       A/G
## 106       C/C       T/T       A/A       A/A       T/T       G/G       G/G
## 107       C/G       T/T       A/G       A/A       T/T       G/G       G/G
## 108       C/G       C/T       G/G       A/A       T/T       A/G       A/G
## 109       C/G       C/T       G/G       A/A       T/T       A/G       A/G
## 110       C/G       T/T       A/G       A/A       T/T       G/G       G/G
## 111       C/C       C/C       G/G       A/A      <NA>       A/A       A/A
## 112       C/G       T/T       A/G       A/A      <NA>       G/G       G/G
## 113       C/G       T/T       A/G       A/A      <NA>       G/G       G/G
## 114       C/G       C/T       G/G       A/A      <NA>       A/G       A/G
## 115       C/G       T/T       A/G       A/A      <NA>       G/G       G/G
## 116       C/G       T/T       A/G       A/A      <NA>       G/G       G/G
## 117       C/C       C/C       G/G       A/A      <NA>       A/A       A/A
## 118       C/G       C/T       G/G       A/A      <NA>       A/G       A/G
## 119       C/C       T/T       A/A       A/A      <NA>       G/G       G/G
## 120       C/G       T/T       A/G       A/A      <NA>       G/G       G/G
## 121       C/G       T/T       A/G       A/A      <NA>       G/G       G/G
## 122       C/C       C/C       G/G       A/A      <NA>       A/A       A/A
## 123       C/C       T/T       A/A       A/A      <NA>       G/G       G/G
## 124       C/C       C/C       G/G       A/A      <NA>       A/A       A/A
## 125       C/G       C/T       G/G       A/A      <NA>       A/G       A/G
## 126       C/G       C/T       G/G       A/A      <NA>       A/G       A/G
## 127       C/C       C/C       G/G       A/A      <NA>       A/A       A/A
## 128       C/G       C/T       G/G       A/A      <NA>       A/G       A/G
## 129       C/G       T/T       A/G       A/A      <NA>       G/G       G/G
## 130       C/C       C/C       G/G       A/A      <NA>       A/A       A/A
## 131       C/G       T/T       A/G       A/A      <NA>       G/G       G/G
## 132       C/C       T/T       A/A       A/A      <NA>       G/G       G/G
## 133       C/C       C/C       G/G       A/A      <NA>       A/A       A/A
## 134       C/C       C/C       G/G       A/A      <NA>       A/A       A/A
## 135       C/G       T/T       G/G       A/A      <NA>       G/G       G/G
## 136       C/G       C/T       G/G       A/A      <NA>       A/G       A/G
## 137      <NA>       T/T      <NA>       A/A      <NA>       G/G      <NA>
## 138       C/C       C/C       G/G       A/A      <NA>       A/A       A/A
## 139       C/G       C/T       G/G       A/A      <NA>       A/G       A/G
## 140       C/C       C/T       A/G       A/A      <NA>       A/G       A/G
## 141       C/C       C/T       A/G       A/A      <NA>       A/G       A/G
## 142       C/C       C/T       A/G       A/A      <NA>       A/G      <NA>
## 143       C/C       C/C       G/G       A/A      <NA>       A/A       A/A
## 144       C/G       C/T       G/G       A/A      <NA>       A/G       A/G
## 145       C/G       C/T       G/G       A/A      <NA>       A/G       A/G
## 146       C/C       C/T       A/G       A/A      <NA>       A/G       A/G
## 147       C/G       T/T       A/G       A/A      <NA>       G/G       G/G
## 148       C/C       C/C       G/G       A/A      <NA>       A/A       A/A
## 149       C/G       C/T       G/G       A/A      <NA>       A/G       A/G
## 150       C/G       C/T       G/G       A/A      <NA>       A/G       A/G
## 151       C/C       C/C       G/G       A/A      <NA>       A/A       A/A
## 152       C/G       T/T       A/G       A/A      <NA>       G/G       G/G
## 153       C/G       C/T       G/G       A/A      <NA>       A/G       A/G
## 154       C/G       T/T       A/G       A/A      <NA>       G/G       G/G
## 155       C/G       T/T       A/G       A/A      <NA>       G/G       G/G
## 156       C/G       T/T       A/G       A/A      <NA>       G/G       G/G
## 157       C/G       C/T       G/G       A/A      <NA>       A/G       A/G
##     snp100034 snp100035
## 1         T/T       T/T
## 2         T/T       T/T
## 3         T/T       T/T
## 4         C/T       T/T
## 5         T/T       T/T
## 6         T/T      <NA>
## 7         T/T       T/T
## 8         T/T       T/T
## 9         C/T       T/T
## 10        T/T       T/T
## 11        C/T       T/T
## 12        C/C       T/T
## 13        C/T       T/T
## 14        T/T       T/T
## 15        T/T       T/T
## 16        T/T       T/T
## 17        C/T       T/T
## 18        T/T       T/T
## 19        T/T       T/T
## 20        C/T       T/T
## 21        T/T       T/T
## 22        C/C       T/T
## 23        C/T      <NA>
## 24        T/T       T/T
## 25        T/T       T/T
## 26        C/T       T/T
## 27        T/T       T/T
## 28        T/T       T/T
## 29        T/T       T/T
## 30        C/C       T/T
## 31        T/T       T/T
## 32        T/T       T/T
## 33        C/T       T/T
## 34        T/T       T/T
## 35        T/T       T/T
## 36        C/C       T/T
## 37        C/C       T/T
## 38        C/T       T/T
## 39        C/T       T/T
## 40        T/T       T/T
## 41        T/T       T/T
## 42        T/T       T/T
## 43        T/T       T/T
## 44        C/T       T/T
## 45        T/T       T/T
## 46        C/C       T/T
## 47        C/T       T/T
## 48        C/T       T/T
## 49        C/T       T/T
## 50        T/T       T/T
## 51        T/T       T/T
## 52        C/T       T/T
## 53        C/T      <NA>
## 54        T/T      <NA>
## 55        T/T       T/T
## 56        T/T       T/T
## 57        T/T       T/T
## 58        T/T       T/T
## 59        C/T       T/T
## 60        T/T       T/T
## 61        T/T       T/T
## 62        T/T       T/T
## 63        T/T       T/T
## 64        T/T       T/T
## 65        T/T       T/T
## 66        T/T       T/T
## 67        C/C       T/T
## 68        T/T       T/T
## 69        T/T       T/T
## 70        C/T       T/T
## 71        C/T       T/T
## 72        T/T       T/T
## 73        T/T       T/T
## 74        C/T      <NA>
## 75        T/T      <NA>
## 76        T/T       T/T
## 77        T/T       T/T
## 78        T/T      <NA>
## 79        T/T       T/T
## 80        C/T       T/T
## 81        C/T       T/T
## 82        T/T       T/T
## 83        T/T      <NA>
## 84        T/T       T/T
## 85        C/C       T/T
## 86        T/T       T/T
## 87        T/T       T/T
## 88        C/C       T/T
## 89        C/C       T/T
## 90        C/T       T/T
## 91        T/T       T/T
## 92        T/T       T/T
## 93        C/T       T/T
## 94        T/T       T/T
## 95        T/T       T/T
## 96        T/T       T/T
## 97        T/T       T/T
## 98        T/T       T/T
## 99        C/T       T/T
## 100       T/T       T/T
## 101       C/T       T/T
## 102       C/T       T/T
## 103       C/T       T/T
## 104       T/T       T/T
## 105       T/T       T/T
## 106       C/C       T/T
## 107       C/T       T/T
## 108       T/T       T/T
## 109       T/T       T/T
## 110       C/T       T/T
## 111       T/T       T/T
## 112       C/T       T/T
## 113       C/T       T/T
## 114       T/T       T/T
## 115       C/T       T/T
## 116       C/T       T/T
## 117       T/T       T/T
## 118       T/T       T/T
## 119       C/C       T/T
## 120       C/T       T/T
## 121       C/T       T/T
## 122       T/T       T/T
## 123       C/C       T/T
## 124       T/T       T/T
## 125       T/T       T/T
## 126       T/T       T/T
## 127       T/T       T/T
## 128       T/T       T/T
## 129       C/T       T/T
## 130       T/T       T/T
## 131       C/T       T/T
## 132       C/C       T/T
## 133       T/T       T/T
## 134       T/T       T/T
## 135       T/T       T/T
## 136       T/T       T/T
## 137      <NA>      <NA>
## 138       T/T       T/T
## 139       T/T       T/T
## 140       C/T       T/T
## 141       C/T       T/T
## 142       C/T      <NA>
## 143       T/T       T/T
## 144       T/T       T/T
## 145       T/T       T/T
## 146       C/T       T/T
## 147       C/T       T/T
## 148       T/T       T/T
## 149       T/T       T/T
## 150       T/T      <NA>
## 151       T/T       T/T
## 152       C/T       T/T
## 153       T/T       T/T
## 154       C/T       T/T
## 155       C/T       T/T
## 156       C/T       T/T
## 157       T/T       T/T

How to form a contingency Table.

xtabs is a stats function included in the R base library. It creates a contingency table (optionally a sparse matrix) from cross-classifying factors, usually contained in a data frame, using a formula interface.

# Generate a contingency table

xtabs(~ casco + snp10001, data=datSNP)
##      snp10001
## casco T/T C/T C/C
##     0  24  21   2
##     1  68  32  10

If one were to interpret the above contingency table generation, it means create a cross-classifying contingency table based on casco and snp10001 columns (~ casco + snp10001) of the data=datSNP. Its usage is as follows:

xtabs(formula = ~., data = parent.frame(), subset, sparse = FALSE, na.action, addNA = FALSE, exclude = if(!addNA) c(NA,NaN), drop.unused.levels = FALSE)

formula is the formula object with the cross-clasifying variables separated by +. Interactions are not allowed.

data is an optional matrix or data frame containing the variables in the formula formula.

subset is an optional vector specifying a subset of observations to be used.

sparse logical specifying if the result should be a sparse matrix, i.e., inheriting from sparseMatrix. Only works for two factors (since there are no higher-order sparse array classes yet).

na.action is a function that indicates what should happen when the data contain NAs. If unspecified, and addNA is true, this is set to na.pass. When it is na.pass and formula has a left hand side (with counts), sum(*, na.rm = TRUE) is used instead of sum(*) for the counts.

With this contingency table, you can see how many each category exists.

Now test for independence between casco and snp10001.

independence_test(casco~snp10001, data=datSNP, teststat="quadratic", scores=list(snp10001=c(0,1,2)))
## 
##  Asymptotic General Independence Test
## 
## data:  casco by snp10001 (T/T < C/T < C/C)
## chi-squared = 0.28459, df = 1, p-value = 0.5937

Let’s delve into the function independence_test included in the coin package. It provides a general independence test for two sets of variables measured on arbitrary scales. This function is based on the general framework for conditional inference procedures proposed by Strasser and Weber (1999). The salient parts of the Strasser-Weber framework are elucidated by Hothorn et al. (2006) and a thorough description of the software implementation is given by Hothorn et al. (2008).

datSNP$snp10001
##   [1] T/T T/T T/T C/T T/T T/T T/T T/T C/T T/T C/T C/C C/T T/T T/T T/T C/T T/T
##  [19] T/T T/T T/T C/C C/T T/T T/T C/T T/T T/T T/T C/C T/T T/T C/T T/T C/T C/T
##  [37] C/C C/T C/T T/T T/T T/T T/T C/T T/T C/C C/T C/T C/T T/T T/T C/T C/T T/T
##  [55] T/T T/T T/T T/T C/T T/T T/T T/T T/T T/T T/T T/T C/C T/T T/T C/T C/T T/T
##  [73] T/T C/T T/T T/T T/T T/T T/T C/T C/T T/T T/T T/T C/C T/T T/T C/C C/C C/T
##  [91] T/T T/T C/T T/T T/T T/T T/T T/T C/T T/T C/T C/T C/T T/T T/T C/C C/T T/T
## [109] T/T C/T T/T C/T C/T T/T C/T C/T T/T T/T C/T C/T C/T T/T C/C T/T T/T T/T
## [127] T/T T/T C/T T/T C/T C/C T/T C/T C/T T/T C/T T/T T/T C/T C/T C/T T/T T/T
## [145] T/T T/T C/T T/T T/T T/T T/T C/T T/T C/T C/T C/T C/T
## Genotypes: T/T C/T C/C
## Alleles:  T C
scores = list(snp10001=c(0,1,2))
scores
## $snp10001
## [1] 0 1 2

How to see a factor is ordered or nominal??

It doesn’t work.

head(datSNP$snp10001)
## [1] T/T T/T T/T C/T T/T T/T
## Genotypes: T/T C/T C/C
## Alleles:  T C

Since there is no ordering, R prints them without indicating any ordering.

For ordinal variables, R indicates the order using < when printing the levels.

So it is clear that snp10001 column implies an unordered (nominal) factor.

Let’s make an ordered factor just for an example.

status <- c("Hi", "Hi", "Lo", "Hi", "Med", "Lo", "Med", "Med", "Med", "Hi")
ordered.status <- factor(status, levels=c("Lo", "Med", "Hi"), ordered=TRUE)
ordered.status
##  [1] Hi  Hi  Lo  Hi  Med Lo  Med Med Med Hi 
## Levels: Lo < Med < Hi

For ordered.status, it works.

min(ordered.status)
## [1] Lo
## Levels: Lo < Med < Hi

So I figured out the reason we use the argument scores in independence test is because snp10001 is a nominal factor without any ordering and such nominal factors should be coerced into class ordered with specific scores for each nominal value.

Refer to this article and look for scores for further information.

4-4. Regression Analysis

If the depending variables are continuous, like height, weight, or BMI - we can’t simply binarily segregate experimental and control groups. We use regression analysis in this case. Also, we designate risk scores as 0, 1, 2 respectively for normal, heterogeneous, and homozygous to assume that the phenotypic risk score increases according to additive model as the number of variant increases. And then we apply logistic regression analysis to this.

additive(datSNP$snp10001)
##   [1] 0 0 0 1 0 0 0 0 1 0 1 2 1 0 0 0 1 0 0 0 0 2 1 0 0 1 0 0 0 2 0 0 1 0 1 1 2
##  [38] 1 1 0 0 0 0 1 0 2 1 1 1 0 0 1 1 0 0 0 0 0 1 0 0 0 0 0 0 0 2 0 0 1 1 0 0 1
##  [75] 0 0 0 0 0 1 1 0 0 0 2 0 0 2 2 1 0 0 1 0 0 0 0 0 1 0 1 1 1 0 0 2 1 0 0 1 0
## [112] 1 1 0 1 1 0 0 1 1 1 0 2 0 0 0 0 0 1 0 1 2 0 1 1 0 1 0 0 1 1 1 0 0 0 0 1 0
## [149] 0 0 0 1 0 1 1 1 1

additive is a function contained in SNPassoc and it distinguish genotypes by arbitrary score.

Linear regression

res <- lm(blood.pre ~ additive(snp10001), data=datSNP)
summary(res)
## 
## Call:
## lm(formula = blood.pre ~ additive(snp10001), data = datSNP)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -2.12566 -0.72566 -0.02566  0.77212  2.27434 
## 
## Coefficients:
##                    Estimate Std. Error t value Pr(>|t|)    
## (Intercept)        12.92566    0.09991 129.373   <2e-16 ***
## additive(snp10001)  0.10222    0.12457   0.821    0.413    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.9906 on 155 degrees of freedom
## Multiple R-squared:  0.004325,   Adjusted R-squared:  -0.002098 
## F-statistic: 0.6733 on 1 and 155 DF,  p-value: 0.4131

Logistic regression

res <- glm(casco ~ additive(snp10001), data=datSNP, family=binomial(link='logit'))
summary(res)
## 
## Call:
## glm(formula = casco ~ additive(snp10001), family = binomial(link = "logit"), 
##     data = datSNP)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.5859  -1.5206   0.8180   0.8694   0.9227  
## 
## Coefficients:
##                    Estimate Std. Error z value Pr(>|z|)    
## (Intercept)          0.9230     0.2230   4.139 3.48e-05 ***
## additive(snp10001)  -0.1447     0.2707  -0.535    0.593    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 191.64  on 156  degrees of freedom
## Residual deviance: 191.36  on 155  degrees of freedom
## AIC: 195.36
## 
## Number of Fisher Scoring iterations: 4

Take the coefficient from the result of logistic regression (res) and raise that to the power of e (exp()).

exp(coef(res))
##        (Intercept) additive(snp10001) 
##          2.5167492          0.8652659

Interpretation

So linear regression is finding the linear equation in the form of, \(y = a_1x_1 + a_2x_2 + .. + b\). The coefficient 0.10222 corresponds to snp10001’s coefficient. So you can interpret it like this; “If you increase one variant, blood.pre is going to increase 0.10222 much.” However, note that p-value is 0.413 and therefore this variable is not so significant in explaining blood pressure. That is, it cannot reject the null hypothesis (\(H_0\)) that blood pressure does not vary with the number of snp10001 variant.

Logistic regression is about finding the coefficients that satisfy \(log\frac{y}{1-y} = a_1x_1 + a_2x_2 + .. + b\) form of linear equation, so it’s important to interpret the coefficient additive(snp10001)= -0.1447. This implies that when you add one variant, the log-odds ratio of the probability of becoming the case decreases with that the ratio of -0.1447. Therefore, if you calculate the exponential from the log-odds ratio,

exp(coef(res))
##        (Intercept) additive(snp10001) 
##          2.5167492          0.8652659

exp(-0.1447)
## [1] 0.8652818

It seems different. why’s that?

You can say that the probability of finding it as the case decreases by 0.8653 times.

GWAS performs the odds ratio estimation and contingency table test on SNP genotype frequency difference between the experimental group and control group. (This means that the possible candidate of associated SNPs will statistically be manifested more in the experimental group). However, other than genotype you gotta also consider different physiological and environmental variables like the age, sex, etc. Regression model is a powerful way to correct for and take into consideration of various external variables, compared to contingency table testing.

x <- seq(-3, 3, by=0.2)
alpha = 1.1
beta = 1.5
y = exp(alpha + beta*x)/(1+exp(alpha + beta*x))
plot(x, y, type="b")

4-4. Hardy-Weinberg Equilibrium (HWE) test

You can do HWE testing with SNPassoc package as well. In HWE test, you assume biallelic (not multi-allelic == no more than alternative forms of a gene), and test for three genotypes that will be present on a locus if they follow the HWE distribution. You can set the p-value threshold from 0.001 to 0.00001 and filter for specific SNPs.

First, load library SNPassoc and load data SNPs.

library(SNPassoc)
data(SNPs)
pre_SNPs <- snp(SNPs$snp10005, sep="")
summary(pre_SNPs) # summary of a factor results in frequency and percentage table of the factor categories
## Genotypes: 
##     frequency percentage
## G/G        84  53.503185
## A/G        70  44.585987
## A/A         3   1.910828
## 
## Alleles: 
##   frequency percentage
## G       238   75.79618
## A        76   24.20382
## 
## HWE (p value): 0.008019904

pre_SNPs
##   [1] G/G A/G G/G G/G G/G G/G A/G A/G G/G A/G A/G G/G G/G A/G A/G G/G G/G A/G
##  [19] G/G A/G G/G G/G G/G G/G A/G G/G A/G A/G A/G G/G A/G A/G A/G G/G G/G G/G
##  [37] G/G A/G A/G G/G G/G G/G A/G A/G A/G G/G G/G G/G G/G A/G G/G G/G G/G G/G
##  [55] A/G G/G A/G G/G A/G G/G G/G A/G A/G A/G A/G A/A G/G G/G G/G A/G G/G G/G
##  [73] A/G G/G A/A A/G G/G A/G A/G G/G A/G A/G G/G A/G G/G G/G A/G G/G G/G G/G
##  [91] G/G A/A A/G G/G G/G A/G G/G A/G A/G A/G G/G G/G G/G A/G A/G G/G A/G A/G
## [109] A/G G/G G/G A/G A/G A/G A/G A/G G/G A/G G/G G/G A/G G/G G/G G/G G/G A/G
## [127] G/G A/G G/G G/G A/G G/G G/G G/G A/G A/G A/G G/G G/G G/G G/G G/G G/G A/G
## [145] A/G G/G A/G G/G G/G G/G G/G A/G A/G A/G A/G A/G A/G
## Genotypes: G/G A/G A/A
## Alleles:  G A

class(pre_SNPs)
## [1] "snp"    "factor"

The class of pre_SNPs is snp and factor.

You can also plot snp object out, representing each genotype’s frequency with height.

plot(pre_SNPs, label="snp10005", col="red")

If you’re HWE testing about many, many numbers of multiple SNP, use the function tableHWE(). If you put the threshold into sig variable in print(), you can mark the SNPs with p-value lower than that with <-.

myData <- setupSNP(data=SNPs, colSNPs=6:40, sep="")
res <- tableHWE(myData)
print(res, sig=0.001)
##           HWE (p value) flag
## snp10001  0.2816            
## snp10002  0.0049            
## snp10003  -                 
## snp10004  -                 
## snp10005  0.0080            
## snp10006  -                 
## snp10007  -                 
## snp10008  0.1378            
## snp10009  0.0028            
## snp100010 -                 
## snp100011 0.0191            
## snp100012 0.0134            
## snp100013 0.0256            
## snp100014 1.0000            
## snp100015 -                 
## snp100016 -                 
## snp100017 0.0005        <-  
## snp100018 0.0005        <-  
## snp100019 0.7463            
## snp100020 0.1254            
## snp100021 -                 
## snp100022 -                 
## snp100023 0.0028            
## snp100024 0.0922            
## snp100025 -                 
## snp100026 -                 
## snp100027 0.0009        <-  
## snp100028 0.4197            
## snp100029 0.0487            
## snp100030 -                 
## snp100031 -                 
## snp100032 0.2589            
## snp100033 0.3264            
## snp100034 0.0487            
## snp100035 -

myData$snp10003
##   [1] G/G  G/G  G/G  G/G  G/G  G/G  G/G  G/G  G/G  G/G  G/G  G/G  G/G  G/G  G/G 
##  [16] G/G  G/G  G/G  G/G  G/G  G/G  G/G  G/G  G/G  G/G  G/G  <NA> G/G  G/G  G/G 
##  [31] G/G  G/G  G/G  <NA> G/G  G/G  G/G  G/G  G/G  G/G  G/G  G/G  G/G  G/G  <NA>
##  [46] G/G  G/G  G/G  G/G  G/G  G/G  G/G  <NA> G/G  G/G  G/G  G/G  G/G  G/G  G/G 
##  [61] G/G  G/G  G/G  G/G  G/G  <NA> G/G  G/G  G/G  G/G  G/G  G/G  G/G  <NA> <NA>
##  [76] G/G  G/G  <NA> G/G  G/G  G/G  G/G  <NA> G/G  G/G  G/G  G/G  G/G  G/G  G/G 
##  [91] G/G  <NA> G/G  G/G  G/G  G/G  G/G  G/G  G/G  G/G  G/G  G/G  G/G  G/G  G/G 
## [106] G/G  G/G  G/G  G/G  G/G  G/G  G/G  G/G  G/G  G/G  G/G  G/G  G/G  G/G  G/G 
## [121] G/G  G/G  G/G  G/G  G/G  G/G  G/G  G/G  G/G  G/G  G/G  G/G  G/G  G/G  G/G 
## [136] <NA> <NA> G/G  G/G  G/G  G/G  G/G  G/G  G/G  G/G  G/G  G/G  G/G  G/G  <NA>
## [151] G/G  G/G  G/G  G/G  G/G  G/G  G/G 
## Genotypes: G/G
## Alleles:  G

All right. Let’s interpret the above table. <- means that it is significantly different in HWE testing. Generally, significant SNPs are deleted in the post-analytical process. If variants are not discovered in samples (MAF = minor allele frequency), variant calling didn’t go well (missingness > 0.02), HWE test’s p-value is significant (HWE<0.001), these are filtered out by various filters and passed onto next steps of analysis. This is how you remove noises.

For SNPs that have only one genotype, the test results come back as -. Also, you could divide the testing groups with a certain criterion (strata = the argument you pass on for stratification of the population) and test for that. Let’s test for sex.

You delete SNPs that show significance

res <- tableHWE(myData, strata=myData$sex)
res
##           all.groups   Male Female
## snp10001      0.2816 0.3941 0.7388
## snp10002      0.0049 0.1660 0.0075
## snp10003           -      -      -
## snp10004           -      -      -
## snp10005      0.0080 0.2755 0.0257
## snp10006           -      -      -
## snp10007           -      -      -
## snp10008      0.1378 0.5078 0.2342
## snp10009      0.0028 0.0992 0.0075
## snp100010          -      -      -
## snp100011     0.0191 1.0000 0.0184
## snp100012     0.0134 0.2761 0.0255
## snp100013     0.0256 0.1206 0.2051
## snp100014     1.0000 0.8101 0.6456
## snp100015          -      -      -
## snp100016          -      -      -
## snp100017     0.0005 0.0304 0.0068
## snp100018     0.0005 0.0304 0.0066
## snp100019     0.7463 1.0000 0.5012
## snp100020     0.1254 0.5078 0.2141
## snp100021          -      -      -
## snp100022          -      -      -
## snp100023     0.0028 0.0972 0.0123
## snp100024     0.0922 0.1551 0.5197
## snp100025          -      -      -
## snp100026          -      -      -
## snp100027     0.0009 0.0304 0.0123
## snp100028     0.4197 1.0000 0.2619
## snp100029     0.0487 0.0772 0.5065
## snp100030          -      -      -
## snp100031          -      -      -
## snp100032     0.2589 0.8170 0.1834
## snp100033     0.3264 0.8139 0.2619
## snp100034     0.0487 0.0772 0.5065
## snp100035          -      -      -

4-5. Manhattan graph

GWAS analysis uses Manhattan graph to visualize the result of statistical analysis on numerous number of variants of numerous number of samples. Next example uses qqman library to print out the data contained in gwasResults as manhattan graph.

qqman package includes functions for creating manhattan plots and q-q plots from GWAS results. Install qqman.

qqman package is included in CRAN repository. You don’t have to use BiocManager to install it actually. But still, I managed to install it through BiocManager.

So, before we go into the depths of this package, let’s get things straight first. What’s a q-q plot? Q-Q plot (also called Quantile-Quantile plot) is a plot that represents a probability distribution with quantiles. It has quantiles (cutpoints) that group each section of distribution from top to bottom.

You can see that if distribution is not the same, quantile-quantile plot is going to deviate from \(y=x\) line.

You can also see here that if distribution is the same, quantile-quantile plot lies on the straight \(y=x\) line.


The gwasResults data.frame included with the package has simulated results for 16,470 SNPs on 22 chromosomes. Take a look at the data:

library(qqman)
## 
## For example usage please run: vignette('qqman')
## 
## Citation appreciated but not required:
## Turner, S.D. qqman: an R package for visualizing GWAS results using Q-Q and manhattan plots. biorXiv DOI: 10.1101/005165 (2014).
## 
head(gwasResults)
##   SNP CHR BP         P
## 1 rs1   1  1 0.9148060
## 2 rs2   1  2 0.9370754
## 3 rs3   1  3 0.2861395
## 4 rs4   1  4 0.8304476
## 5 rs5   1  5 0.6417455
## 6 rs6   1  6 0.5190959

The data consists of SNP ID (rsID), the chromosome, and variant position, and p-value of statistical analysis result. Manhattan graph’s x-axis lists variants on the basis of chromosome order and locus within the chromosomes, and the vertical axis prints -log value of p-value. Generally the threshold p-value for GWAS are \(5 \times 10^{-8}\) and \(10^{-5}\).

Select for the locus where p-value is less than \(5 \times 10^{-8}\).

sig_loci_gwas <- gwasResults[gwasResults$P<(5*10**(-8)),]
sig_loci_gwas
##         SNP CHR  BP            P
## 3040 rs3040   3 349 2.672902e-08
## 3050 rs3050   3 359 7.803872e-09
## 3054 rs3054   3 363 3.892246e-08
## 3056 rs3056   3 365 7.807433e-09
## 3057 rs3057   3 366 4.438922e-09
## 3060 rs3060   3 369 2.488724e-08
sig_loci_gwas[order(sig_loci_gwas$SNP), ]
##         SNP CHR  BP            P
## 3040 rs3040   3 349 2.672902e-08
## 3050 rs3050   3 359 7.803872e-09
## 3054 rs3054   3 363 3.892246e-08
## 3056 rs3056   3 365 7.807433e-09
## 3057 rs3057   3 366 4.438922e-09
## 3060 rs3060   3 369 2.488724e-08

You can see that the significantly varied SNPs exist in chromosome 3 between variant positions rs3040 ~ rs3060.

manhattan(gwasResults, main="Manhattan Plot", cex = 0.5, cex.axis=0.8, col = c("blue4", "orange3"))

You can also print out the genotype, stats about each alleles, and HWE test results as bar plots.